class EDI::A::StreamingParser
Class StreamingParser
¶ ↑
Introduction¶ ↑
Turning a whole EDI
interchange into an EDI::A::Interchange
object with method parse
is both convenient and memory consuming. Sometimes, interchanges become just too large to keep them completely in memory. The same reasoning holds for large XML documents, where there is a common solution: The SAX/SAX2 API, a streaming approach. This class implements the same idea for EDI
data.
Use StreamingParser
instances to parse ANSI X12 data sequentially. Sequential parsing saves main memory and is applicable to arbitrarily large interchanges.
At its core lies method go
. It scans the input stream and employs callbacks on_*
which implement most of the parser tasks.
Syntax check¶ ↑
Without your customizing the callbacks, this parser just scans through the data. Only callback on_error()
contains code: It raises an exception telling you about the location and kind of syntax error encountered.
Example: Syntax check¶ ↑
parser = EDI::A::StreamingParser.new parser.go( File.open 'damaged_file.x12' ) --> EDI::EDISyntaxError at offset 1234, last chars = UNt+1+0
Callbacks¶ ↑
Most callbacks provided here are just empty shells. They usually receive a string of interest (a segment content, i.e. everything from the segment tag to and excluding the segment terminator) and also the segment tag as a separate string when tags could differ.
Overwrite them to adapt the parser to your needs!
Example: Counting segments¶ ↑
class MyParser < EDI::A::StreamingParser attr_reader :counters def initialize @counters = Hash.new(0) super end def on_segment( s, tag ) @counters[tag] += 1 end end parser = MyParser.new parser.go( File.open 'myfile.x12' ) puts "Segment tag statistics:" parser.counters.keys.sort.each do |tag| print "%03s: %4d\n" % [ tag, parser.counters[tag] ] end
Want to save time? Throw :done
when already done!¶ ↑
Most callbacks may terminate further parsing by throwing symbol :done
. This saves a lot of time e.g. if you already found what you were looking for. Otherwise, parsing continues until getc
hits EOF
or an error occurs.
Example: A
simple search¶ ↑
parser = EDI::A::StreamingParser.new def parser.on_segment( s, tag ) # singleton if tag == 'CLM' puts "Interchange contains at least one segment CLM !" puts "Here is its contents: #{s}" throw :done # Skip further parsing end end parser.go( File.open 'myfile.x12' )
Public Class Methods
# File lib/edi4r/ansi_x12.rb, line 1357 def initialize @path = 'input stream' end
Public Instance Methods
The one-pass reader & dispatcher of segments, SAX-style.
It reads sequentially through the given stream of octets and generates calls to the callbacks on_...
Parameter hnd
may be any object supporting method getc
.
# File lib/edi4r/ansi_x12.rb, line 1443 def go( hnd ) state, offset, item, tag = :outside, 0, '', '' seg_term, de_sep, ce_sep, rep_sep = nil, nil, nil, nil isa_count = nil @path = hnd.path if hnd.respond_to? :path self.on_interchange_start catch(:done) do loop do c = hnd.getc case state # State machine # Characters outside of a segment context when :outside case c when nil break # Regular exit at EOF when (?A..?Z) unless item.empty? # Flush self.on_other( item ) item = '' end item << c; tag << c state = :tag1 else item << c end # Found first tag char, now expecting second when :tag1 case c when (?A..?Z),(?0..?9) item << c; tag << c state = :tag2 else # including 'nil' self.on_error(EDISyntaxError, offset, item, c) end # Found second tag char, now expecting optional last when :tag2 case c when (?A..?Z),(?0..?9) item << c; tag << c if tag=='ISA' state = :in_isa isa_count = 0 else state = :in_segment end when de_sep item << c state = :in_segment else # including 'nil' self.on_error(EDISyntaxError, offset, item, c) end when :in_isa self.on_error(EDISyntaxError, offset, item) if c.nil? item << c; isa_count += 1 case isa_count when 1; de_sep = c when 80; rep_sep = c # FIXME: Version 5.x only when 102; ce_sep = c when 103 seg_term = c dispatch_item( item , tag, [ce_sep, de_sep, rep_sep||' ', seg_term] ) item, tag = '', '' state = :outside end if isa_count > 103 # Should never occur EDI::logger.warn "isa_count = #{isa_count}" self.on_error(EDISyntaxError, offset, item, c) end when :in_segment case c when nil self.on_error(EDISyntaxError, offset, item) when seg_term dispatch_item( item , tag ) item, tag = '', '' state = :outside else item << c end else # Should never occur... raise ArgumentError, "unexpected state: #{state}" end offset += 1 end # loop # self.on_error(EDISyntaxError, offset, item) unless state==:outside end # catch(:done) self.on_interchange_end offset end
Called upon syntax errors. Parsing should be aborted now.
# File lib/edi4r/ansi_x12.rb, line 1431 def on_error(err, offset, fragment, c=nil) raise err, "offset = %d, last chars = %s%s" % [offset, fragment, c.nil? ? '<EOF>' : c.chr] end
Called when GE encountered
# File lib/edi4r/ansi_x12.rb, line 1396 def on_ge( s ) end
Called when GS encountered
# File lib/edi4r/ansi_x12.rb, line 1391 def on_gs( s ) end
Called when IEA encountered
# File lib/edi4r/ansi_x12.rb, line 1386 def on_iea( s, tag ) end
Called at EOF - overwrite for your cleanup purposes. Note: Must not throw :done
!
# File lib/edi4r/ansi_x12.rb, line 1376 def on_interchange_end end
Called at start of reading - overwrite for your init purposes. Note: Must not throw :done
!
# File lib/edi4r/ansi_x12.rb, line 1370 def on_interchange_start end
Called when ISA encountered
# File lib/edi4r/ansi_x12.rb, line 1381 def on_isa( s, tag ) end
This callback is usually kept empty. It is called when the parser finds strings between segments or in front of or trailing an interchange.
Strictly speaking, such strings are not permitted by the ANSI X12 syntax rules. However, it is quite common to put a line break between segments for better readability. The default settings thus ignore such occurrences.
If you need strict conformance checking, feel free to put some code into this callback method, otherwise just ignore it.
# File lib/edi4r/ansi_x12.rb, line 1426 def on_other( s ) end
Called when SE encountered
# File lib/edi4r/ansi_x12.rb, line 1406 def on_se( s, tag ) end
Called when any other segment encountered
# File lib/edi4r/ansi_x12.rb, line 1411 def on_segment( s, tag ) end
Called when ST encountered
# File lib/edi4r/ansi_x12.rb, line 1401 def on_st( s, tag ) end
Convenience method. Returns the path of the File object passed to method go
or just string ‘input stream’
# File lib/edi4r/ansi_x12.rb, line 1363 def path @path end