Tutorial

Getting started

Installation

sudo gem install edi4r
sudo gem install edi4r-tdid

Require statements

require "edi4r"         # Try require_gem if this fails on your site
require "edi4r/edifact"
require "edi4r-tdid"    # Try require_gem if this fails on your site

Creating an (outbound) UN/EDIFACT interchange

An empty interchange

ic = EDI::E::Interchange.new

creates an empty interchange object with syntax version 3 and charset UNOB. You can make this a bit more explicit by passing parameters as hash components:

ic = EDI::E::Interchange.new( :version => 3, :charset => 'UNOB' )

See the source for more parameters.

An empty message

msg = ic.new_message

creates an empty message in the context of the given interchange, i.e. the syntax version, charset, UNA settings, interactive or batch EDI.

By default, the message type is ORDERS D.96A. Select any message from any UN/TDID by passing the corresponding parameters as hash components:

msg = ic.new_message( :msg_type=>'ORDERS', :version=>'D', :release=>'96A',
                      :resp_agency=>'UN' )

Hash components which you do not specify are taken from a set of defaults.

Filling an interchange

You may add messages to the interchange any time by calling method add():

ic.add( msg )

When adding new messages to an interchange, they get appended to the current interchange content. There is no method to insert a message at any other location. If you need to do that, hold your messages in an array, sort them any way you like, and finally add them to the interchange in the desired sequence.

Note that each message gets validated by default when you add it to the interchange. If your message needs to be completed only later, you may disable validation by calling:

ic.add( msg, false )

Filling a message

A freshly created message is empty, aside from its header and trailer which we shall discuss later. Simply create the segments you want to add, fill them, and add them to the message:

seg = msg.new_segment( 'BGM' )

Here, we derived a BGM segment from the current context, i.e. an UN/TDID like D.96A which we specified when creating the message given.

Note that new_segment() accepts all segment tags available in the whole TDID’s segment directory - not just those usable within this message type.

Add content to the segment (see below) and add it to the message:

msg.add( seg )

Like with messages added to an interchange, it is your responsibility to assure the proper sequence of segments. You will need the UN/EDIFACT message structure, a subset description, or a message implementation guideline (MIG) handy in order to comply.

It is possible to add empty or partially filled segments to a message. Just keep a reference to them and fill in their required data elements later.

Accessing Composites and Data Elements

Background

While interchanges and messages are basically empty when created, segments are not: They come equipped with the composites (CDE, composite data elements ) and data elements (DE) they comprise in their current context. Likewise, CDEs come equipped with the sequence of DEs which they contain according to the underlying TDID.

Segments and CDEs are basically sequences (arrays) of their component elements. This sequence depends on the TDID of their context. E.g., a BGM segment from D.96A looks different from a BGM in D.01B. These sequences are fixed and cannot/should not be altered after creation of a segment or CDE.

Getters for CDE

There is a getter method for each CDE of a segment. Its name is simply prefix ‘c’ and the CDE name.

In order to access, say, C002 in segment BGM, the getter is named ‘c’ + ‘C002’:

cde = seg.cC002

(See below how to handle arrays)

In most cases you’ll need the CDE object only to access its component data elements. Let’s see how that works:

Getters for DE values

During mapping tasks, we normally do not deal with the internal organization of DE objects. All we need is access to their values.

Similarly to CDE getters, we build a DE getter by prepending a ‘d’ to the DE name. The result is a method that returns the current value of this DE (nil if unassigned), not the DE object itself:

order_number = seg.d1004  if cde.d1001 == 220

The example shows that this concept is very convenient and works both with component DEs in a CDE and plain DEs in a segment.

Setters for DE values

We use the same approach for setters - a DE setter actually changes its value, not the DE object itself:

bgm = msg.new_segment( 'BGM' )
bgm.d1004 = '123456ABC'
bgm.cC002.d1001 = 220

Well, that’s both easy and readable! But what about the integer assigned to DE 1004? Don’t worry - its string value is still what we later want to see in the interchange file.

Setters for CDE and segments?

There is no such thing! A CDE should always be derived from its proper context and must not be changed thereafter. Likewise, a segment’s content is nothing a user should interfere with.

Does that sound like dictatorship? Well, there are ways to manipulate CDEs and segments, but that’s an advanced and rarely used topic well out of scope of this tutorial …

DE and CDE arrays

Sometimes, a DE occurs multiple times within a segment or CDE, and a CDE may occur multiple times within a segment.

Before syntax version 4, EDIFACT does not really employ the concept of an array. Instead, there are multiple occurrences of a particular DE or CDE listed in a row.

In such a case, the corresponding getters and setters won’t work. Actually, they raise a TypeError to make sure that you don’t accidentally overlook that there is more than one instance of the given ©DE.

We obtain a whole array of all matching ©DE instead and indicate this with a prefix ‘a’:

seg = msg.new_segment('PIA')
cde_list = seg.aC212

cde_list[0].d7140 = '54321'
cde_list[0].d7143 = 'SA'
cde_list[0].d3055 = 91

cde_list[1].d7140 = ... # etc

seg = msg.new_segment('NAD')
seg.cC080.a3036[0].value = 'E. X. Ample'
seg.cC080.a3036[1].value = 'Sales dept.'

Note that a3036 returns an array of DE objects, not their values! We thus use DE setter value to actually assign new values to those DE objects.

Sometimes, a CDE or segment contains the same DE more than once even if both instances are separated by a different element, like DE 3055 and 1131 in C088 of segment FII, which you may find in invoices. In that case the same concept holds: cde.a1131 would fetch all instances, no matter if some other elements occur in between or not.

Building it all together

OK, so we keep generating segments, filling their data elements with content, and adding them to the message that we are about to build - fine.

Likewise, we add messages to the interchange. Fine - but how do we eventually get the output, how do we make sure that we have not forgotten anything, and how do we deal with the service segments (e.g. to define sender and recipient IDs) ?

Headers and trailers

Interchanges and messages are objects which may come with a header and trailer segment. In UN/EDIFACT, these are UNB/UNZ and UNH/UNT, respectively.

In order to let us focus on the content, edi4r keeps these service segments away from us and tries its best to treat them automatically. For example, we do not have to count segments or messages - edi4r takes care of that and updates the corresponding DE in UNT and UNZ, respectively.

If we really need to access data there, that’s possible anytime through getters header and trailer. From then on, we just use the usual DE and CDE getters and setters. E.g., setting the UNB sender ID works like this:

ic.header.cS002.d0004 = '1234567'

Setting the test indicator may look like this:

ic.header.d0035 = 1

UNA handling

The pseudo segment UNA commonly introduces an UN/EDIFACT interchange. It is shown there by default, which is a good idea in most cases. If you really have to switch it off, use:

ic.show_una = false

It can be easily viewed e.g. by:

puts ic.una

The six special characters it containes are both readable and modifiable. Please note that these characters are represented as their ASCII integer codes, not as one-character strings. Here is the list of corresponding getter methods:

ce_sep()       # Component data element separator, default ?:
de_sep()       # Data element separator, default: ?+
decimal_sign() # Both ?. and ?, are eligible
esc_char()     # Escape character, default: ??
rep_sep()      # Repetition occurrence indicator, default:
               #  ?  (SV1-3), ?* (syntax version 4)
seg_term()     # Segment terminator, default: ?'

Corresponding setters allow you to change all of them. Remember to pass ASCII values, not strings. Example:

pri.cC509.d5118 = 30.1
pri.to_s      --> "PRI+AAA:30.1::LIU"
ic.una.decimal_sign = ?,
pri.to_s      --> "PRI+AAA:30,1::LIU"
ic.una.ce_sep = ?/
pri.to_s      --> "PRI+AAA/30,1//LIU"

Validation

Edi4r comes with a set of built-in validation rules. In order to validate your interchange, just call

ic.validate

This method will return the number of (recoverable) issues found. Error messages and warnings are written to $stderr. There are a few non-recoverable errors which raise exceptions.

Actually, you may apply method validate() to almost all of the EDI objects mentioned so far. However, doing this once for the whole interchance will validate everything.

Printing and saving

Once our data is validated, we want to send it to our business partner. Actually, that’s very simple: Output it e.g. to stdout or to a file by just printing it! This works nicely, because our EDI objects are all equipped with reasonable methods “to_s()”.

Processing (inbound) interchanges

Building the interchange object

We build a whole interchange with a single call by passing its character representation as a String or IO object to class method parse():

ic = nil
File.open("inbound_01.edi") {|hnd| ic = EDI::E::Interchange.parse( hnd )}

That’s it - from now on we may access all its contents in much the same way as we did during generation.

Iterating over messages

Typically we’d like to loop through the messages contained:

ic.each { |msg|  map_message( msg ) }

with some suitably created mapping procedure map_message(). In this context, the interchange is treated as a container of messages. We therefore use the standard iterator each().

Actually, index access is also available, as are following array-like methods:

[](int), index(obj), each(&b), find_all(&b), size(),
length(), first(), last().

Examples:

second_msg = ic[1]
last_msg = ic.last

Awaiting segments of a message

Similarly, we iterate through the segments of a message. The following construction lets you select segments in their proper context (which is the segment group):

def map_message( msg )
  # do your initialization here, then

  msg.each do |seg|
    seg_name = seg.name
    seg_name += ' ' + seg.sg_name if seg.sg_name
    case seg_name
    when "BGM"
      # do this ...
    when "DTM"
      # do that ...
    when 'NAD SG2'
      # react only if NAD occurs in segment group 2

    # ... etc., finally:
    default
      raise "Segment #{seg_name}: Not accounted for!"
  end
end

If you need to obtain all segments with a given tag, pass the tag as a string to the index operator:

d = msg['DTM']  # Array of all DTM segments, any segment group

Actually, a message behaves like a container just as an interchange does, so the array-like methods listed in the previous section also apply here:

d = msg.find_all {|seg| seg.name == 'DTM' && seg.sg_name == 'SG20'}

Selecting segment group instances

Iterating over all segments of a message sequentially tends to produce cluttered code when messages grow complex. Wouldn’t it be nice if we could delegate e.g. the mapping of a whole NAD group to a specialized procedure? Still better - could we delegate mapping of a whole item group to a specialized routine?

Actually that’s quite easy when we recall that segment groups are side branches of the main trunk (or a higher-level branch) of a message diagram, with their trigger segments being the T-shaped “joints”. Thus, segments of a segment group are mere descendants of their trigger segment in the branching diagram.

Here are some helpful methods to select segments of a group:

# ...
when 'NAD SG2'
  map_nad_sg2( seg.children_and_self ) # skip segment COM...
# ...
when 'LIN SG28'
  map_item( seg.descendants_and_self )
# ...

Methods

descendants(), descendants_and_self(), children(), and
children_and_self()

are inspired by XPath axes. They return an array of segments which depend on the given (trigger) segment as their common ancestor.

Using these selectors, writing modular mapping code now becomes an easy task.

Peeking into interchanges

Background

Sometimes we only need to extract some header information out of EDI files, e.g. in order to find out whether the content is UN/EDIFACT, who sent it to whom, or just to see if the UNB test indicator is set.

We could of course apply EDI::E::Interchange.parse() and access the header of the resulting interchange object when we know that a given file contains UN/EDIFACT data. However, that would be a big waste of resources, especially for large interchanges.

Method “peek()” for UN/EDIFACT data

Edi4r instead offers method “peek()”. It reads just enought bytes from the file to determine its contents and to decode the header. Like “parse()” it returns an Interchange object, but that one is empty except for the header (and a dummy trailer) segment.

You can then extract any header element you need through the usual getters. Example: Find out if the test indicator is set.

def is_testdata?( hnd )
  ic = EDI::E::Interchange.peek( hnd )
  ic.d0035 == '1' || ic.d0035 == 1
end

Auto-detection and implicit decompression

Regular EDI users need to archive their business data. In simple cases, moving interchange files into proper folders after successfully processing them already does the job.

You can save a lot of space though by compressing them. Applying “gzip” to EDIFACT data easily shrinks them to 10 % of their original volume.

So far, so good. Later though, you may need to extract a specific file from the archive, e.g. the interchange with control reference “ref3456” from a customer with sender id “xyz”. Well, you do not need to maintain a separate index or decompress all files in the archive in order to find it. There is a generic class method “Interchange.peek()” that does all this for you. Consider the following code fragment that assumes that ARGV contains the list of files to search:

require 'zlib'

found = ARGV.find do |fname|
          ic = EDI::Interchange.peek(File.open(fname))
          h = ic.header
          ic.syntax=='E' && h.cS002.d0004=='xyz' && h.d0020=='ref1234'
        end
ic = EDI::Interchange.parse(File.open(found))
# ...

Note that this code will work with both zipped and unzipped data, and with UN/EDIFACT as well as other content.

XML representation

Background

EDI interchanges may be regarded as abstract objects which need some representation when stored or exchanged. E.g., UN/EDIFACT interchanges may be expressed (represented) by syntax version 1-4 and a choice of separator and termination characters without changing their identity. Likewise, interchanges may be represented by some suitable XML document type.

There was a time when classical EDI representations were considered outdated and to be replaced by XML documents. We know by now that a mere change in representation does not help to resolve the real issues of e-business. Nonetheless, XML-based technology has become much more wide-spread than EDI, so chances are high that one has to integrate classical EDI data into a XML-driven architecture.

There are (too) many ways to do this, and attempts have been made to standardize them. In particular, DIN 16557-4 (www.beuth.de/cmd?workflowname=CSVList&websource=&artid=43768898) describes a way how to represent UN/EDIFACT interchanges as XML documents and how to describe them with DTDs.

The DIN approach however focuses only on UN/EDIFACT and does not represent the logical structure of documents. It merely encodes segments as XML elements and data elements and composites as their attributes. The value of DTDs is limited, as they need to be generated for any particular interchange and lack information about mandatory data elements and CDEs, let alone code lists.

This library therefore supports a generic approach that allows us to represent any interchange object as an XML document, be it a UN/EDIFACT interchange, a file of SAP IDocs, or an ANSI X.12 or other interchange when such a module becomes available. Native and XML representation are fully interchangeable, and the XML representation reflects information from the branching diagram, thus supporting considerably XPath-based processing (e.g. you could easily select an instance of a line item group). Formal validation through a single generic DTD is available, while in-depth validation remains available through this library at the abstract level.

Generating an XML representation of an interchange

The current implementation of XML features is built on Ruby’s REXML module (alternative implementations are conceivable). Simply load the additional methods after loading all other optional EDI4R modules, then use method “to_xml()” like this:

# Other require statements, finally:
require "edi4r/rexml"

# Generate your interchange "ic", then:
xdoc = REXML::Document.new   # Empty REXML document
ic.to_xml( xdoc )            # Fill it

# The rest is standard REXML handling. Here, we write the xdoc to a file.
# (See REXML::Document.write() for details on indenting)
xdoc.write( File.open( 'mydata.xml','w'), 0 )

Building an interchange from its XML representation

No matter what EDI standard the interchange represents, its corresponding EDI4R object can be re-generated easily. Just make sure that you have loaded the corresponding module(s):

# Other require statements, finally:
require "edi4r/rexml"

ic = EDI::Interchange.parse( File.open('mydata.xml') )

Yes, that’s right: It’s the same statement that would also load UN/EDIFACT data!

If you know already what to expect, you might bypass EDI4R’s auto-detection and directly call one of the parse_xml() methods:

xdoc = REXML::Document.new( File.open('mydata.xml') )
ic = EDI::E::Interchange.parse_xml( xdoc )

Utilities: edi2xml.rb, xml2edi.rb

These two scripts are included to further simplify your transition between traditional EDI representation and XML representation. They are command-line tools that simply wrap the library calls mentioned above. Example:

$ edi2xml.rb foo.edi > foo.xml
$ xml2edi.rb foo.xml > bar.edi
$ diff foo.edi bar.edi   # There should be no differences

Tools

editool.rb

Use this command-line tool e.g. when you want to

Called without option, it just builds an internal memory model of the passed file(s) and raises an exception upon parsing errors. Thorough validation can be requested optionally. Example:

$ editool.rb -l foo.edi bar.edi
$ editool.rb -p *.edi *.xml

Further reading

A pair of full-blown mappings (inhouse-to-EANCOM, EANCOM-to-inhouse) shows in much more detail how to do outbound and inbound mapping. See the source codes in the “test” folder!

Misc topics

To be supplied later; currently just a room to collect keywords to cover.

Debugging and viewing

:linebreak, :indented
inspect()
Exception classes
Segments: T-nodes, ordinal number, index, occurrence, max. occurrence
empty?, required?

More advanced features

Validation: warnings and exceptions (logging?)
Add-ons to class Time
Consistency of references common to headers and trailers
Inheriting settings from parent: UNH from UNG, UNG from UNB
Low-level access to Collection contents

Enjoy,

-- Heinz