Application Programming Interface

BioInterchange offers Application Programming Interfaces (APIs) for multiple programming languages, but only Ruby is supporting the full range of functinality that BioInterchange provides. For Python and Java, a subset of the Ruby API is available: vocabulary wrappers for convenient access to ontologies used in BioInterchange.

Ruby API

The Ruby API — including vocabulary wrapper classes — are available via RubyGems:

sudo gem install biointerchange

Vocabulary Wrappers

Ruby classes are provided for the ontologies that is used for serializing RDF. Each ontology is represented by its own Ruby class. The classes provide access to the ontology terms and additional methods for resolving OWL classes, datatype properties and object properties.

Usage example (see also vocabulary.rb):

require 'rubygems'
require 'biointerchange'

include BioInterchange

def print_resource(resource)
  puts "    #{resource}"
  puts "        Ontology class:             #{GFF3O.is_class?(resource)}"
  puts "        Ontology object property:   #{GFF3O.is_object_property?(resource)}"
  puts "        Ontology datatype property: #{GFF3O.is_datatype_property?(resource)}"
end

# Get the URI of an ontology term by label:
puts "'seqid' property:"
print_resource(GFF3O.seqid())

# Ambiguous labels will return an array of URIs:
# "start" can refer to a sub-property of "feature_properties" or "target_properties"
puts "'start' properties:"
GFF3O.start().each { |start_synonym|
  print_resource(start_synonym)
}
# "feature_properties" can be either a datatype or object property
puts "'feature_properties' properties:"
GFF3O.feature_properties().each { |feature_properties_synonym|
  print_resource(feature_properties_synonym)
}

# Use build-in method "is_datatype_property" to resolve ambiguity:
# (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.)
feature_properties = GFF3O.feature_properties().select { |uri| GFF3O.is_datatype_property?(uri) }
puts "'feature_properties' properties, which are a datatype property:"
feature_properties.each { |feature_property|
  print_resource(feature_property)
}

# Use build-in method "with_parent" to pick properties based on their context:
puts "'start' property with parent datatype property 'feature_properties':"
GFF3O.with_parent(GFF3O.start(), feature_properties[0]).each { |feature_property|
  print_resource(feature_property)
}

With the BioInterchange gem installed, the example can be executed on the command line via:

git clone git://github.com/BioInterchange/BioInterchange.git
cd BioInterchange
git checkout v1.0.0
ruby examples/vocabulary.rb

RDFization Framework

Usage example (see also rdfization.rb):

require 'rubygems'
require 'biointerchange'

include BioInterchange::Phylogenetics

# Create a reader that reads phylogenetic trees in Newick format:
reader = NewickReader.new()

# Create a model from a single example tree:
# (Note: the `deserialize` method also takes streams as parameter -- not just strings.)
model = reader.deserialize('((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;')

# Serialize the model as RDF N-Triples to STDOUT:
CDAORDFWriter.new(STDOUT).serialize(model)

Implementing New Readers, Models and Writers

New readers, models and writers are best adopted from or build upon the existing implementations. The phylogenetic trinity of Newick file format reader, BioRuby based tree model, and CDAO RDF writer is used here as a guidline due to its simplicity.

Reader: Creating an Object Model

The quintessential Newick tree reader is depicted below. Its class is placed in a Ruby module that encapsulates all phylogenetic related source code. The NewickReader class inherits from the BioInterchange framework class Reader that provides method stubs which need to be overwritten. Using the central registry BioInterchange::Registry, the reader informs the framework of its: unique identifier (phylotastic.newick), Ruby class (NewickReader), command line parameters that it accepts (date, which becomes --annotate_date), whether the reader can operate without reading the complete input all at once (true), a descriptive name of the reader (Newick Tree [...]), and an array with descriptions for each parameter stated above.

Deserialization of Newick trees is done using the deserialize method, which must take both strings and input streams as valid arguments. If this contraint is not satisfied, then an ImplementationReaderError is thrown that is caught by the framework and handled appropriately.

Finally, the postponed? method keeps track of deferred input processing. If the batch size was reached and the model was passed on for serialization to a writer, then this method will have to return true.

require 'bio'

module BioInterchange::Phylogenetics

class NewickReader < BioInterchange::Reader

  # Register reader:
  BioInterchange::Registry.register_reader(
    'phylotastic.newick',
    NewickReader,
    [ 'date' ],
    true,
    'Newick Tree File Format reader',
    [
      [ 'date ', 'date when the Newick file was created (optional)' ]
    ]
  )

  # Creates a new instance of a Newick file format reader.
  #
  # The reader supports batch processing.
  #
  # +date+:: Optional date of when the Newick file was produced, annotated, etc.
  # +batch_size+:: Optional integer that determines that number of features that
  # should be processed in one go.
  def initialize(date = nil, batch_size = nil)
    @date = date
    @batch_size = batch_size
  end

  # Reads a Newick file from the input stream and returns an associated model.
  #
  # If this method is called when +postponed?+ returns true, then the reading will
  # continue from where it has been interrupted beforehand.
  #
  # +inputstream+:: an instance of class IO or String that holds the contents of a Newick file
  def deserialize(inputstream)
    if inputstream.kind_of?(IO)
      create_model(inputstream)
    elsif inputstream.kind_of?(String) then
      create_model(StringIO.new(inputstream))
    else
      raise BioInterchange::Exceptions::ImplementationReaderError, 'The provided input stream needs to be either of type IO or String.'
    end
  end

  # Returns true if the reading of the input was postponed due to a full batch.
  def postponed?
    @postponed
  end

protected

# ...concrete implementation omitted.
Tree Model

A model is created by a reader and it is subsequently consumed by a writer. The phylogenetic tree model inherits BioInterchange::Model which defines the prune method. If batch operation is in place, i.e. the input is not completely read into memory, then the prune method will be called to instruct the model to drop all information that has not to be kept in memory anymore. In a sense, this can be seen as a form of garbage collection, where data that has been serialized is purged from memory.

module BioInterchange::Phylogenetics

# A phylogenetic tree set that can contain multiple phylogenetic trees.
class TreeSet < BioInterchange::Model

  # Create a new instance of a tree set. A tree set can contain multiple phylogenetic trees.
  def initialize
    # Trees are stored as the keys of a hash map to increase performance:
    @set = {}
  end

  # ...omitted internal data structure handling.

  # Removes all features from the set, but keeps additional data (e.g., the date).
  def prune
    @set.clear
  end

end

end
Writer: From Object Model to RDF

The writer takes an object model and serializes it via the BioInterchange::Writer derived serialize method. A writer uses BioInterchange::Registry to make itself known to the BioInterchange framework, where it signs up using the following arguments: a unique identifier (rdf.phylotastic.newick), its implementing class (CDAORDFWriter), a list of readers that it is compatible with (phylotastic.newick), whether the writer supports batch processing where only parts of the input need to be kept in memory (true), and a descriptive name for the writer.

require 'rdf'
require 'rdf/ntriples'

module BioInterchange::Phylogenetics

# Serialized phylogenetic tree models based on BioRuby's phylogenetic tree implementation.
class CDAORDFWriter < BioInterchange::Writer

  # Register writers:
  BioInterchange::Registry.register_writer(
    'rdf.phylotastic.newick',
    CDAORDFWriter,
    [ 'phylotastic.newick' ],
    true,
    'Comparative Data Analysis Ontology (CDAO) based RDFization'
  )

  # Creates a new instance of a CDAORDFWriter that will use the provided output stream to serialize RDF.
  #
  # +ostream+:: instance of an IO class or derivative that is used for RDF serialization
  def initialize(ostream)
    @ostream = ostream
  end

  # Serialize a model as RDF.
  #
  # +model+:: a generic representation of input data that is an instance of BioInterchange::Phylogenetics::TreeSet
  def serialize(model)
    model.contents.each { |tree|
      serialize_model(model, tree)
    }
  end

protected

# ...omitted actual serialization implementation.

Python API

Vocabulary wrappers in Python are available as an egg, that can be installed via easy_install:

sudo easy_install rdflib
sudo easy_install http://www.biointerchange.org/eggs/biointerchange-1.0.0-py2.7.egg

Usage example (see also example.py):

import biointerchange
from biointerchange import *
from rdflib.namespace import Namespace

def print_resource(resource):
    print "    " + resource
    print "        Ontology class:             " + str(GFF3O.is_class(resource))
    print "        Ontology object property:   " + str(GFF3O.is_object_property(resource))
    print "        Ontology datatype property: " + str(GFF3O.is_datatype_property(resource))

# Get the URI of an ontology term by label:
print "'seqid' property:"
print_resource(GFF3O.seqid())

# Ambiguous labels will return an array of URIs:
# "start" can refer to a sub-property of "feature_properties" or "target_properties"
print "'start' properties:"
for start_synonym in GFF3O.start():
    print_resource(start_synonym)

# "feature_properties" can be either a datatype or object property
print "'feature_properties' properties:"
for feature_properties_synonym in GFF3O.feature_properties():
    print_resource(feature_properties_synonym)

# Use build-in method "is_datatype_property" to resolve ambiguity:
# (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.)
feature_properties = filter(lambda uri: GFF3O.is_datatype_property(uri), GFF3O.feature_properties())
print "'feature_properties' properties, which are a datatype property:"
for feature_property in feature_properties:
    print_resource(feature_property)

# Use build-in method "with_parent" to pick properties based on their context:
print "'start' property with parent datatype property 'feature_properties':"
for feature_property in GFF3O.with_parent(GFF3O.start(), feature_properties[0]):
    print_resource(feature_property)

The example can be executed on the command line via:

git clone git://github.com/BioInterchange/BioInterchange.git
cd BioInterchange
git checkout v1.0.0
cd supplemental/python
python example.py

Java API

Vocabulary wrappers in Java are available as a Maven artifact. Add the following repository and dependency setting to your Project Object Model (POM) file:

<repositories>
  <repository>
    <id>biointerchange</id>
    <name>BioInterchange</name>
    <url>http://www.biointerchange.org/artifacts</url>
  </repository>
</repositories>

<dependencies>
  <dependency>
    <groupId>org.biointerchange</groupId>
    <artifactId>vocabularies</artifactId>
    <version>1.0.0</version>
  </dependency>
</dependencies>

Usage example (see also App.java):

package org.biointerchange;

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.vocabulary.*;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.collections.Predicate;

import java.util.Set;

import org.biointerchange.vocabulary.*;

/**
 * Demo on how to make use of BioInterchange's vocabulary classes.
 *
 * @author Joachim Baran
 */
public class App 
{
    public static void main(String[] args) {
        Resource seqid = GFF3O.seqid();
        System.out.println("'seqid' property:");
        printResource(seqid);

        System.out.println("'start' properties:");
        Set start = GFF3O.start();
        for (Resource startSynonym : start)
            printResource(startSynonym);

        System.out.println("'feature_properties' properties:");
        Set featureProperties = GFF3O.feature_properties();
        for (Resource featurePropertiesSynonym : featureProperties)
            printResource(featurePropertiesSynonym);

        System.out.println("'feature_properties' properties, which are a datatype property:");
        CollectionUtils.filter(featureProperties, new Predicate() {
            public boolean evaluate(Object o) {
                return GFF3O.isDatatypeProperty((Resource)o);
            }
        });
        for (Resource featurePropertiesSynonym : featureProperties)
            printResource(featurePropertiesSynonym);

        System.out.println("'start' property with parent datatype property 'feature_properties':");
        Set startUnderDatatypeFeatureProperties = GFF3O.withParent(start, featureProperties.iterator().next());
        for (Resource startSynonym : startUnderDatatypeFeatureProperties)
            printResource(startSynonym);
    }

    private static void printResource(Resource resource) {
        System.out.println("    " + resource.toString());
        System.out.println("        Namespace:                            " + resource.getNameSpace());
        System.out.println("        Local name:                           " + resource.getLocalName());
        System.out.println("        Jena Property (rather than Resource): " + (resource instanceof Property));
        System.out.println("        Ontology class:                       " + GFF3O.isClass(resource));
        System.out.println("        Ontology object property:             " + GFF3O.isObjectProperty(resource));
        System.out.println("        Ontology datatype property:           " + GFF3O.isDatatypeProperty(resource));
    }
}

Another example that uses SIO instead of GFF3O is provided as AppSIO.java.

The examples can be executed through Maven:

git clone git://github.com/BioInterchange/BioInterchange.git
cd BioInterchange
git checkout v1.0.0
cd supplemental/java/biointerchange
mvn compile
mvn exec:java -Dexec.mainClass="org.biointerchange.App"
mvn exec:java -Dexec.mainClass="org.biointerchange.AppSIO"