class BioInterchange::TextMining::PDFxXMLReader
Public Instance Methods
deserialize(inputstream)
click to toggle source
Reads input stream and returns associated BioInterchange::TextMining::Document
model
Presently I assume a single document per xml file, and that <section> tags cannot nest. I also assume that a Content::DOCUMENT type is everything between the <article> tags.
inputstream
-
Input IO stream to deserialize
# File lib/biointerchange/textmining/pdfx_xml_reader.rb, line 37 def deserialize(inputstream) raise BioInterchange::Exceptions::ImplementationReaderError, 'InputStream not of type IO, cannot read.' unless inputstream.kind_of?(IO) or inputstream.kind_of?(String) @input = inputstream pdfx end
Private Instance Methods
pdfx()
click to toggle source
# File lib/biointerchange/textmining/pdfx_xml_reader.rb, line 47 def pdfx list = MyListener.new REXML::Document.parse_stream(@input, list) return list.document end