Class PdfTextLocator
java.lang.Object
org.openpdf.text.pdf.parser.PdfTextLocator
Locates text pattern coordinates inside a PDF file.
- Since:
- 2.1.4
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final PdfReaderThe PdfReader that holds the PDF file.private final TextAssemblerTheTextAssemblerthat will receive render notifications and provide resultant text -
Constructor Summary
ConstructorsConstructorDescriptionPdfTextLocator(PdfReader reader) Creates a new Text Locator object, using aTextAssembleras the render listenerPdfTextLocator(PdfReader reader, boolean usePdfMarkupElements) Creates a new Text Extractor object, using aTextAssembleras the render listenerPdfTextLocator(PdfReader reader, TextAssembler renderListener) Creates a new Text Locator object. -
Method Summary
Modifier and TypeMethodDescriptionprivate byte[]getContentBytesForPage(int pageNum) Gets the content bytes of a page.private byte[]getContentBytesFromContentObject(PdfObject contentObject) Gets the content bytes from a content object, which may be a reference a stream or an array.voidprocessContent(byte[] contentBytes, PdfDictionary resources, PdfContentTextLocator handler) Processes PDF syntaxsearchFile(float[] coordinates) Locates text within a bounding box inside a PDFsearchFile(String pattern) Locates text pattern inside a PDFsearchPage(int page, float[] coordinates) Locates text within a bounding box inside a pagesearchPage(int page, String pattern) Locates text pattern inside a page
-
Field Details
-
reader
The PdfReader that holds the PDF file. -
renderListener
TheTextAssemblerthat will receive render notifications and provide resultant text
-
-
Constructor Details
-
PdfTextLocator
Creates a new Text Locator object, using aTextAssembleras the render listener- Parameters:
reader- the reader with the PDF
-
PdfTextLocator
Creates a new Text Extractor object, using aTextAssembleras the render listener- Parameters:
reader- the reader with the PDFusePdfMarkupElements- should we use higher level tags for PDF markup entities?
-
PdfTextLocator
Creates a new Text Locator object.- Parameters:
reader- the reader with the PDFrenderListener- the render listener that will be used to analyze renderText operations and provide resultant text
-
-
Method Details
-
getContentBytesForPage
Gets the content bytes of a page.- Parameters:
pageNum- the 1-based page number of page you want get the content stream from- Returns:
- a byte array with the effective content stream of a page
- Throws:
IOException
-
getContentBytesFromContentObject
Gets the content bytes from a content object, which may be a reference a stream or an array.- Parameters:
contentObject- the object to read bytes from- Returns:
- the content bytes
- Throws:
IOException
-
searchPage
Locates text pattern inside a page- Parameters:
page- page number we are interested inpattern- text to match- Returns:
ArrayListList of matched text patterns with coordinates.- Throws:
IOException- on error
-
searchFile
Locates text pattern inside a PDF- Parameters:
pattern- text to match- Returns:
ArrayListList of matched text patterns with coordinates.- Throws:
IOException- on error
-
searchPage
Locates text within a bounding box inside a page- Parameters:
page- page number we are interested incoordinates- bounding box to extract text from- Returns:
ArrayListList of matched text patterns with coordinates.- Throws:
IOException- on error
-
searchFile
Locates text within a bounding box inside a PDF- Parameters:
coordinates- bounding box to extract text from- Returns:
ArrayListList of matched text patterns with coordinates.- Throws:
IOException- on error
-
processContent
public void processContent(byte[] contentBytes, PdfDictionary resources, PdfContentTextLocator handler) Processes PDF syntax- Parameters:
contentBytes- the bytes of a content streamresources- the resources that come with the content streamhandler- interprets events caused by recognition of operations in a content stream.
-