module IOP
IOP
is intended for constructing the data processing pipelines in a manner of UNIX command-line pipes.
There are three principle types of the pipe nodes which can be composed:
-
Feed
node.
This is the start point of the pipe. It has no upstream node and may have downstream node. Its purpose its to generate blocks of data and send them downstream in sequence. A typical feed class is implemented by including the {Feed} module and defining the #process!
method which calls {Feed#process} method to send the data. An example of the feed node is a file reader ({FileReader}) which reads file and sends its contents in blocks.
-
Sink
node.
This is the end point of the pipe. It has upstream node and no downstream node. Its purpose is to consume the received data. A typical sink class is implemented by including the {Sink} module and defining the #process
method. An example of the sink node is a file writer ({FileWriter}) which receives the data in blocks and writes it into file.
-
Filter node.
A filter is a pass-through node which sits between feed and sink and therefore has both upstream and downstream nodes. The simplest way to create a filter class is to include both {Feed} and {Sink} which manifest both mandatory #process!
and #process
methods. Such filter is a no-op that is it does nothing apart passing the received data downstream. An example of the filter node is the digest computer ({DigestComputer}) which computes hash sum of the data it passes through. In order to perform intended processing of the data a filter class overrides the {Feed#process} method.
The basic control flow for an {IOP}-aware pipe is as follows:
-
The pipe is constructed from one or more {IOP}-aware class instances. The two or more objects are linked together
with the | operator implemented as the {Feed#|} method by default.
-
The actual processing is then triggered by the {Sink#process!} method of the very last object in the pipe.
By default, this method calls the same method of the upstream node thus forming the stack of nested calls for all objects in the pipe.
-
Upon reaching the very first object in the pipe (which by definition has no upstream node),
the feed, starts sending blocks of data downstream with the {Feed#process} method. All objects' method implementations (except for the one of the last object in the pipe) are expected to push either this or transformed data further downstream.
-
After all data has been processed the finalizing call +#process(nil)+ signifies the end-of-data after which
no data should be sent.
In case the {Sink#process!} method is overridden in concrete class it is normally organized as follows:
def process! # ...initialization code... super ensure # ...finalization code... end
to perform specific setup/cleanup actions, including exception handling and to pass the control flow upstream with super
call.
Note that when an exception is caught and processed in overridden #process!
method it must be re-raised in order for other upstream objects to have a chance to react to it as well.
In case the {Feed#process} is overridden in concrete class it is organized as follows:
def process(data = nil) # ... do something with data, convert data to new_data... super(new_data) end
The data being sent is expected to be a String
of arbitrary size. It is however advisable to detect and omit zero-sized strings.
Note that the data passed to this method may be a reusable buffer of some other upstream object therefore a duplication (or cloning) should be performed if the data is stored between the method invocations.
Constants
- DEFAULT_BLOCK_SIZE
Default read block size in bytes for adapters which don't have this parameter externally imposed.
- DEFAULT_OPENSSL_CIPHER
Default cipher ID for OpenSSL adapters.
- EXTRA_DATA
@private
- INSUFFICIENT_DATA
@private
- VERSION
Public Class Methods
@private
# File lib/iop.rb, line 86 def self.allocate_string(size) String.new(capacity: size) end
@private Finds minimum of the values
# File lib/iop.rb, line 107 def self.min(a, b) a < b ? a : b end