class BioInterchange::TextMining::PDFxXMLReader

Public Instance Methods

deserialize(inputstream) click to toggle source

Reads input stream and returns associated BioInterchange::TextMining::Document model

Presently I assume a single document per xml file, and that <section> tags cannot nest. I also assume that a Content::DOCUMENT type is everything between the <article> tags.

inputstream

Input IO stream to deserialize

# File lib/biointerchange/textmining/pdfx_xml_reader.rb, line 37
def deserialize(inputstream)
  raise BioInterchange::Exceptions::ImplementationReaderError, 'InputStream not of type IO, cannot read.' unless inputstream.kind_of?(IO) or inputstream.kind_of?(String)
  
  @input = inputstream
  
  pdfx
end

Private Instance Methods

pdfx() click to toggle source
# File lib/biointerchange/textmining/pdfx_xml_reader.rb, line 47
def pdfx
  list = MyListener.new
  REXML::Document.parse_stream(@input, list)
  return list.document
end