Class PdfContentStreamHandler

java.lang.Object
org.openpdf.text.pdf.parser.PdfContentStreamHandler
Direct Known Subclasses:
PdfContentTextExtractor, PdfContentTextLocator

public abstract class PdfContentStreamHandler extends Object
  • Field Details

    • textFragmentStreams

      protected final Stack<List<TextAssemblyBuffer>> textFragmentStreams
    • contextNames

      protected final Stack<String> contextNames
    • renderListener

      protected final TextAssembler renderListener
      detail parser for text within a marked section. used by TextAssembler
    • operators

      protected Map<String, ContentOperator> operators
      A map with all supported operators operators (PDF syntax). Protected to allow subclasses to override installDefaultOperators() and register additional operators.
    • gsStack

      protected Stack<GraphicsState> gsStack
      Stack keeping track of the graphics state.
    • textMatrix

      protected Matrix textMatrix
      Text matrix.
    • textLineMatrix

      protected Matrix textLineMatrix
      Text line matrix.
    • textFragments

      protected List<TextAssemblyBuffer> textFragments
  • Constructor Details

    • PdfContentStreamHandler

      public PdfContentStreamHandler(TextAssembler renderListener)
  • Method Details

    • getMatrix

      private static Matrix getMatrix(List<PdfObject> operands)
    • registerContentOperator

      public void registerContentOperator(ContentOperator operator)
      Registers a content operator that will be called when the specified operator string is encountered during content processing. Each operator may be registered only once (it is not legal to have multiple operators with the same operatorString)
      Parameters:
      operator - the operator that will receive notification when the operator is encountered
      Since:
      2.1.7
    • installDefaultOperators

      protected void installDefaultOperators()
      Loads all the supported graphics and text state operators in a map. Subclasses can override this method to register additional operators. When overriding, subclasses should call super.installDefaultOperators() first.
    • lookupOperator

      protected Optional<ContentOperator> lookupOperator(String operatorName)
      Get the operator to process a command with a given name
      Parameters:
      operatorName - name of the operator that we might need to call
      Returns:
      the operator or null if none present
    • invokeOperator

      public void invokeOperator(PdfLiteral operator, List<PdfObject> operands, PdfDictionary resources)
      Invokes an operator.
      Parameters:
      operator - the PDF Syntax of the operator
      operands - a list with operands
      resources - Pdf Resources found in the file containing the stream.
    • popContext

      abstract void popContext()
    • pushContext

      abstract void pushContext(String newContextName)
    • graphicsState

      GraphicsState graphicsState()
      Returns the current graphics state.
      Returns:
      the graphics state
    • reset

      public abstract void reset()
    • getCurrentTextMatrix

      protected Matrix getCurrentTextMatrix()
      Returns the current text matrix.
      Returns:
      the text matrix
      Since:
      2.1.5
    • getCurrentTextLineMatrix

      protected Matrix getCurrentTextLineMatrix()
      Returns the current line matrix.
      Returns:
      the line matrix
      Since:
      2.1.5
    • applyTextAdjust

      void applyTextAdjust(float tj)
      Adjusts the text matrix for the specified adjustment value (see TJ operator in the PDF spec for information)
      Parameters:
      tj - the text adjustment
    • getCurrentFont

      public CMapAwareDocumentFont getCurrentFont()
      Returns:
      current font in processing state
    • displayPdfString

      abstract void displayPdfString(PdfString string)
      Displays text.
      Parameters:
      string - the text to display
    • getResultantText

      public abstract String getResultantText()
      Returns:
      result text
    • processContent

      protected void processContent(byte[] contentBytes, PdfDictionary resources)
      Processes PDF content stream bytes.
      Parameters:
      contentBytes - the bytes of a content stream
      resources - the resources that come with the content stream
    • getContentBytesFromPdfObject

      protected byte[] getContentBytesFromPdfObject(PdfObject object) throws IOException
      Gets the content bytes from a PdfObject, which may be a reference, a stream or an array. This is a utility method that can be used by subclasses and other classes in this package.
      Parameters:
      object - the object to read bytes from
      Returns:
      the content bytes
      Throws:
      IOException - if there's an error reading the content
    • getContentBytesFromPdfObjectStatic

      static byte[] getContentBytesFromPdfObjectStatic(PdfObject object) throws IOException
      Gets the content bytes from a PdfObject, which may be a reference, a stream or an array. This is a static utility method that can be used by any class in this package.
      Parameters:
      object - the object to read bytes from
      Returns:
      the content bytes
      Throws:
      IOException - if there's an error reading the content