Class ParsedText

    • Field Detail

      • textToUserSpaceTransformMatrix

        private final Matrix textToUserSpaceTransformMatrix
      • pdfText

        private PdfString pdfText
        retain original PdfString as we need to distinguish between the code points contained there, and the standard Java (Unicode strings) that actually represent the content of this text.
    • Constructor Detail

      • ParsedText

        ParsedText​(PdfString text,
                   GraphicsState graphicsState,
                   Matrix textMatrix)
        This constructor should only be called when the origin for text display is at (0,0) and the graphical state reflects all transformations of the baseline. This is in text space units.
        Parameters:
        text - string
        graphicsState - graphical state
        textMatrix - transform from text space to graphics (drawing space)
      • ParsedText

        private ParsedText​(PdfString text,
                           GraphicsState graphicsState,
                           Matrix textMatrix,
                           float unscaledWidth)
        Internal constructor for a parsed text item. The constructors that call it gather some information from the graphical state first.
        Parameters:
        text - This is a PdfString containing code points for the current font, not actually characters. If the font has multiByte glyphs, (Identity-H encoding) we reparse the string so that the code points don't get split into multiple characters.
        graphicsState - graphical state
        textMatrix - transform from text space to graphics (drawing space)
        unscaledWidth - width of the space character in the font.
    • Method Detail

      • pointToUserSpace

        private static Vector pointToUserSpace​(float xOffset,
                                               float yOffset,
                                               Matrix textToUserSpaceTransformMatrix)
        Parameters:
        xOffset - offset in x direction
        yOffset - offset in y direction
        textToUserSpaceTransformMatrix - transform from text space to graphics (drawing space)
        Returns:
        the cross product of the offset and the textToUserSpaceTransformMatrix
      • getUnscaledFontSpaceWidth

        private static float getUnscaledFontSpaceWidth​(GraphicsState graphicsState)
        Calculates the width of a space character. If the font does not define a width for a standard space character , we also attempt to use the width of   (a non-breaking space in many fonts)
        Parameters:
        graphicsState - graphic state including current transformation to page coordinates from text measurement
        Returns:
        the width of a single space character in text space units
      • getStringWidth

        private static float getStringWidth​(java.lang.String string,
                                            GraphicsState graphicsState)
        Gets the width of a String in text space units
        Parameters:
        string - the string that needs measuring
        graphicsState - graphic state including current transformation to page coordinates from text measurement
        Returns:
        the width of a String in text space units
      • convertWidthToUser

        private static float convertWidthToUser​(float width,
                                                Matrix textToUserSpaceTransformMatrix)
        Parameters:
        width - which should be converted to user space
        textToUserSpaceTransformMatrix - transform from text space to graphics (drawing space)
        Returns:
        distance between start and end position
      • distance

        private static float distance​(Vector startPos,
                                      Vector endPos)
        Parameters:
        startPos - of the vector
        endPos - of the vector
        Returns:
        (endPos - startPos).length
      • convertHeightToUser

        private static float convertHeightToUser​(float height,
                                                 Matrix textToUserSpaceTransformMatrix)
        Parameters:
        height - which should be converted to user space
        textToUserSpaceTransformMatrix - transform from text space to graphics (drawing space)
        Returns:
        distance between start and end position
      • decode

        protected java.lang.String decode​(java.lang.String in)
        Decodes a Java String containing glyph ids encoded in the font's encoding, and determine the unicode equivalent
        Parameters:
        in - the String that needs to be decoded
        Returns:
        the decoded String
      • decode

        protected java.lang.String decode​(PdfString pdfString)
        This constructor should only be called when the origin for text display is at (0,0) and the graphical state reflects all transformations of the baseline. This is in text space units.

        Decodes a PdfString (which will contain glyph ids encoded in the font's encoding) based on the active font, and determine the unicode equivalent

        Parameters:
        pdfString - the String that needs to be encoded
        Returns:
        the encoded String
        Since:
        2.1.7
      • getAsPartialWords

        public java.util.List<Word> getAsPartialWords()
        Break this string if there are spaces within it. If so, we mark the new Words appropriately for later assembly.

        We are guaranteed that every space (internal word break) in this parsed text object will create a new word in the result of this method. We are not guaranteed that these Word objects are actually words until they have been assembled.

        The word following any space preserves that space in its string value, so that the assembler will not erroneously merge words that should be separate, regardless of the spacing.

        Returns:
        list of Word objects.
      • preprocessString

        private boolean preprocessString​(char[] chars,
                                         boolean[] hasSpace)
        Calculate whether individual character positions (after font decoding from code to a character), contain spaces and break words, and whether the resulting words should be treated as complete (i.e. if any spaces were found.
        Parameters:
        chars - to check
        hasSpace - array same length as chars, each position representing whether it breaks a word
        Returns:
        true if any spaces were found.
      • createWord

        private Word createWord​(java.lang.StringBuffer wordAccum,
                                float wordStartOffset,
                                float wordEndOffset,
                                Vector baseline,
                                boolean wordsAreComplete,
                                boolean currentBreakBefore)
        Create a word to represent a broken substring at a space. As spaces have zero "word length" make sure that they also have a baseline to check
        Parameters:
        wordAccum - buffer of characters
        wordStartOffset - intial x-offset
        wordEndOffset - ending x offset.
        baseline - baseline of this word, so direction of progress can be measured in line ending determination.
        wordsAreComplete - true means characters in this word won't be split apart graphically
        currentBreakBefore - true if this word fragment represents a word boundary, and any preceding fragment is complete.
        Returns:
        the new word
      • getUnscaledTextWidth

        public float getUnscaledTextWidth​(GraphicsState gs)
        Parameters:
        gs - graphic state including current transformation to page coordinates from text measurement
        Returns:
        the unscaled (i.e. in Text space) width of our text
      • accumulate

        public void accumulate​(TextAssembler textAssembler,
                               java.lang.String contextName)
        We pass ourselves to the assembler, which is a visitor, so that it can accumulate information on this text depending on its type. The result is calculated by a final "assembly" phase, after accumulation is done. This is because we may have non-contiguous items in a PDF text stream.
        Parameters:
        textAssembler - the assembler that is visiting us.
        contextName - Name of the surrounding markup element/"context" if we're generating tagged output.
        See Also:
        TextAssemblyBuffer.accumulate(com.lowagie.text.pdf.parser.TextAssembler, String)
      • getFontCodes

        public java.lang.String getFontCodes()
        Returns:
        a string whose characters represent code points in a possibly two-byte font
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
        See Also:
        Object.toString()