class EDI::A::StreamingParser

Class StreamingParser

Introduction

Turning a whole EDI interchange into an EDI::A::Interchange object with method parse is both convenient and memory consuming. Sometimes, interchanges become just too large to keep them completely in memory. The same reasoning holds for large XML documents, where there is a common solution: The SAX/SAX2 API, a streaming approach. This class implements the same idea for EDI data.

Use StreamingParser instances to parse ANSI X12 data sequentially. Sequential parsing saves main memory and is applicable to arbitrarily large interchanges.

At its core lies method go. It scans the input stream and employs callbacks on_* which implement most of the parser tasks.

Syntax check

Without your customizing the callbacks, this parser just scans through the data. Only callback on_error() contains code: It raises an exception telling you about the location and kind of syntax error encountered.

Example: Syntax check

parser = EDI::A::StreamingParser.new
parser.go( File.open 'damaged_file.x12' )
--> EDI::EDISyntaxError at offset 1234, last chars = UNt+1+0

Callbacks

Most callbacks provided here are just empty shells. They usually receive a string of interest (a segment content, i.e. everything from the segment tag to and excluding the segment terminator) and also the segment tag as a separate string when tags could differ.

Overwrite them to adapt the parser to your needs!

Example: Counting segments

class MyParser < EDI::A::StreamingParser
  attr_reader :counters

  def initialize
    @counters = Hash.new(0)
    super
  end

  def on_segment( s, tag )
    @counters[tag] += 1
  end
end

parser = MyParser.new
parser.go( File.open 'myfile.x12' )
puts "Segment tag statistics:"
parser.counters.keys.sort.each do |tag|
  print "%03s: %4d\n" % [ tag, parser.counters[tag] ]
end

Want to save time? Throw :done when already done!

Most callbacks may terminate further parsing by throwing symbol :done. This saves a lot of time e.g. if you already found what you were looking for. Otherwise, parsing continues until getc hits EOF or an error occurs.

Example: A simple search

parser = EDI::A::StreamingParser.new
def parser.on_segment( s, tag ) # singleton
  if tag == 'CLM'
    puts "Interchange contains at least one segment CLM !"
    puts "Here is its contents: #{s}"
    throw :done   # Skip further parsing
  end
end
parser.go( File.open 'myfile.x12' )

Public Class Methods

new() click to toggle source
# File lib/edi4r/ansi_x12.rb, line 1357
def initialize
  @path = 'input stream'
end

Public Instance Methods

go( hnd ) click to toggle source

The one-pass reader & dispatcher of segments, SAX-style.

It reads sequentially through the given stream of octets and generates calls to the callbacks on_... Parameter hnd may be any object supporting method getc.

# File lib/edi4r/ansi_x12.rb, line 1443
    def go( hnd )
      state, offset, item, tag = :outside, 0, '', ''
      seg_term, de_sep, ce_sep, rep_sep = nil, nil, nil, nil
      isa_count = nil

      @path = hnd.path if hnd.respond_to? :path

      self.on_interchange_start

      catch(:done) do
        loop do
          c = hnd.getc

          case state # State machine

            # Characters outside of a segment context
          when :outside
            case c

            when nil
              break # Regular exit at EOF

            when (?A..?Z)
              unless item.empty? # Flush
                self.on_other( item )
                item = ''
              end
              item << c; tag << c
              state = :tag1

            else
              item << c
            end

            # Found first tag char, now expecting second
          when :tag1
            case c

            when (?A..?Z),(?0..?9)
              item << c; tag << c
              state = :tag2

            else # including 'nil'
              self.on_error(EDISyntaxError, offset, item, c)
            end

            # Found second tag char, now expecting optional last
          when :tag2
            case c
            when (?A..?Z),(?0..?9)
              item << c; tag << c
              if tag=='ISA'
                state = :in_isa
                isa_count = 0
              else
                state = :in_segment
              end
            when de_sep
                item << c
                state = :in_segment
            else # including 'nil'
              self.on_error(EDISyntaxError, offset, item, c)
            end

          when :in_isa
            self.on_error(EDISyntaxError, offset, item) if c.nil?
            item << c; isa_count += 1
            case isa_count
            when 1;    de_sep = c
            when 80;   rep_sep = c # FIXME: Version 5.x only
            when 102;  ce_sep = c
            when 103
                seg_term = c
                dispatch_item( item , tag,
                                [ce_sep, de_sep, rep_sep||' ', seg_term] )
                item, tag = '', ''
                state = :outside
            end
            if isa_count > 103 # Should never occur
                EDI::logger.warn "isa_count = #{isa_count}"
                self.on_error(EDISyntaxError, offset, item, c)
            end

          when :in_segment
            case c
            when nil
              self.on_error(EDISyntaxError, offset, item)
            when seg_term
              dispatch_item( item , tag )
              item, tag = '', ''
              state = :outside
            else
              item << c
            end

          else # Should never occur...
            raise ArgumentError, "unexpected state: #{state}"
          end  
          offset += 1
        end # loop
#        self.on_error(EDISyntaxError, offset, item) unless state==:outside
      end # catch(:done)
      self.on_interchange_end
      offset
    end
on_error(err, offset, fragment, c=nil) click to toggle source

Called upon syntax errors. Parsing should be aborted now.

# File lib/edi4r/ansi_x12.rb, line 1431
def on_error(err, offset, fragment, c=nil)
  raise err, "offset = %d, last chars = %s%s" % 
    [offset, fragment, c.nil? ? '<EOF>' : c.chr]
end
on_ge( s ) click to toggle source

Called when GE encountered

# File lib/edi4r/ansi_x12.rb, line 1396
def on_ge( s )
end
on_gs( s ) click to toggle source

Called when GS encountered

# File lib/edi4r/ansi_x12.rb, line 1391
def on_gs( s )
end
on_iea( s, tag ) click to toggle source

Called when IEA encountered

# File lib/edi4r/ansi_x12.rb, line 1386
def on_iea( s, tag )
end
on_interchange_end() click to toggle source

Called at EOF - overwrite for your cleanup purposes. Note: Must not throw :done !

# File lib/edi4r/ansi_x12.rb, line 1376
def on_interchange_end
end
on_interchange_start() click to toggle source

Called at start of reading - overwrite for your init purposes. Note: Must not throw :done !

# File lib/edi4r/ansi_x12.rb, line 1370
def on_interchange_start
end
on_isa( s, tag ) click to toggle source

Called when ISA encountered

# File lib/edi4r/ansi_x12.rb, line 1381
def on_isa( s, tag )
end
on_other( s ) click to toggle source

This callback is usually kept empty. It is called when the parser finds strings between segments or in front of or trailing an interchange.

Strictly speaking, such strings are not permitted by the ANSI X12 syntax rules. However, it is quite common to put a line break between segments for better readability. The default settings thus ignore such occurrences.

If you need strict conformance checking, feel free to put some code into this callback method, otherwise just ignore it.

# File lib/edi4r/ansi_x12.rb, line 1426
def on_other( s )
end
on_se( s, tag ) click to toggle source

Called when SE encountered

# File lib/edi4r/ansi_x12.rb, line 1406
def on_se( s, tag )
end
on_segment( s, tag ) click to toggle source

Called when any other segment encountered

# File lib/edi4r/ansi_x12.rb, line 1411
def on_segment( s, tag )
end
on_st( s, tag ) click to toggle source

Called when ST encountered

# File lib/edi4r/ansi_x12.rb, line 1401
def on_st( s, tag )
end
path() click to toggle source

Convenience method. Returns the path of the File object passed to method go or just string ‘input stream’

# File lib/edi4r/ansi_x12.rb, line 1363
def path
  @path
end