BioInterchange offers Application Programming Interfaces (APIs) for multiple programming languages, but only Ruby is supporting the full range of functinality that BioInterchange provides. For Python and Java, a subset of the Ruby API is available: vocabulary wrappers for convenient access to ontologies used in BioInterchange.
The Ruby API — including vocabulary wrapper classes — are available via RubyGems:
sudo gem install biointerchange
Ruby classes are provided for the ontologies that is used for serializing RDF. Each ontology is represented by its own Ruby class. The classes provide access to the ontology terms and additional methods for resolving OWL classes, datatype properties and object properties.
Usage example (see also vocabulary.rb):
require 'rubygems' require 'biointerchange' include BioInterchange def print_resource(resource) puts " #{resource}" puts " Ontology class: #{GFF3O.is_class?(resource)}" puts " Ontology object property: #{GFF3O.is_object_property?(resource)}" puts " Ontology datatype property: #{GFF3O.is_datatype_property?(resource)}" end # Get the URI of an ontology term by label: puts "'seqid' property:" print_resource(GFF3O.seqid()) # Ambiguous labels will return an array of URIs: # "start" can refer to a sub-property of "feature_properties" or "target_properties" puts "'start' properties:" GFF3O.start().each { |start_synonym| print_resource(start_synonym) } # "feature_properties" can be either a datatype or object property puts "'feature_properties' properties:" GFF3O.feature_properties().each { |feature_properties_synonym| print_resource(feature_properties_synonym) } # Use build-in method "is_datatype_property" to resolve ambiguity: # (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.) feature_properties = GFF3O.feature_properties().select { |uri| GFF3O.is_datatype_property?(uri) } puts "'feature_properties' properties, which are a datatype property:" feature_properties.each { |feature_property| print_resource(feature_property) } # Use build-in method "with_parent" to pick properties based on their context: puts "'start' property with parent datatype property 'feature_properties':" GFF3O.with_parent(GFF3O.start(), feature_properties[0]).each { |feature_property| print_resource(feature_property) }
With the BioInterchange gem installed, the example can be executed on the command line via:
git clone git://github.com/BioInterchange/BioInterchange.git cd BioInterchange git checkout v1.0.0 ruby examples/vocabulary.rb
Usage example (see also rdfization.rb):
require 'rubygems' require 'biointerchange' include BioInterchange::Phylogenetics # Create a reader that reads phylogenetic trees in Newick format: reader = NewickReader.new() # Create a model from a single example tree: # (Note: the `deserialize` method also takes streams as parameter -- not just strings.) model = reader.deserialize('((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;') # Serialize the model as RDF N-Triples to STDOUT: CDAORDFWriter.new(STDOUT).serialize(model)
New readers, models and writers are best adopted from or build upon the existing implementations. The phylogenetic trinity of Newick file format reader, BioRuby based tree model, and CDAO RDF writer is used here as a guidline due to its simplicity.
The quintessential Newick tree reader is depicted below. Its class is placed in a Ruby module that encapsulates all phylogenetic related source code. The NewickReader
class inherits from the BioInterchange framework class Reader
that provides method stubs which need to be overwritten. Using the central registry BioInterchange::Registry
, the reader informs the framework of its: unique identifier (phylotastic.newick
), Ruby class (NewickReader
), command line parameters that it accepts (date
, which becomes --annotate_date
), whether the reader can operate without reading the complete input all at once (true
), a descriptive name of the reader (Newick Tree [...]
), and an array with descriptions for each parameter stated above.
Deserialization of Newick trees is done using the deserialize
method, which must take both strings and input streams as valid arguments. If this contraint is not satisfied, then an ImplementationReaderError
is thrown that is caught by the framework and handled appropriately.
Finally, the postponed?
method keeps track of deferred input processing. If the batch size was reached and the model was passed on for serialization to a writer, then this method will have to return true
.
require 'bio' module BioInterchange::Phylogenetics class NewickReader < BioInterchange::Reader # Register reader: BioInterchange::Registry.register_reader( 'phylotastic.newick', NewickReader, [ 'date' ], true, 'Newick Tree File Format reader', [ [ 'date', 'date when the Newick file was created (optional)' ] ] ) # Creates a new instance of a Newick file format reader. # # The reader supports batch processing. # # +date+:: Optional date of when the Newick file was produced, annotated, etc. # +batch_size+:: Optional integer that determines that number of features that # should be processed in one go. def initialize(date = nil, batch_size = nil) @date = date @batch_size = batch_size end # Reads a Newick file from the input stream and returns an associated model. # # If this method is called when +postponed?+ returns true, then the reading will # continue from where it has been interrupted beforehand. # # +inputstream+:: an instance of class IO or String that holds the contents of a Newick file def deserialize(inputstream) if inputstream.kind_of?(IO) create_model(inputstream) elsif inputstream.kind_of?(String) then create_model(StringIO.new(inputstream)) else raise BioInterchange::Exceptions::ImplementationReaderError, 'The provided input stream needs to be either of type IO or String.' end end # Returns true if the reading of the input was postponed due to a full batch. def postponed? @postponed end protected # ...concrete implementation omitted.
A model is created by a reader and it is subsequently consumed by a writer. The phylogenetic tree model inherits BioInterchange::Model
which defines the prune
method. If batch operation is in place, i.e. the input is not completely read into memory, then the prune
method will be called to instruct the model to drop all information that has not to be kept in memory anymore. In a sense, this can be seen as a form of garbage collection, where data that has been serialized is purged from memory.
module BioInterchange::Phylogenetics # A phylogenetic tree set that can contain multiple phylogenetic trees. class TreeSet < BioInterchange::Model # Create a new instance of a tree set. A tree set can contain multiple phylogenetic trees. def initialize # Trees are stored as the keys of a hash map to increase performance: @set = {} end # ...omitted internal data structure handling. # Removes all features from the set, but keeps additional data (e.g., the date). def prune @set.clear end end end
The writer takes an object model and serializes it via the BioInterchange::Writer
derived serialize
method. A writer uses BioInterchange::Registry
to make itself known to the BioInterchange framework, where it signs up using the following arguments: a unique identifier (rdf.phylotastic.newick
), its implementing class (CDAORDFWriter
), a list of readers that it is compatible with (phylotastic.newick
), whether the writer supports batch processing where only parts of the input need to be kept in memory (true
), and a descriptive name for the writer.
require 'rdf' require 'rdf/ntriples' module BioInterchange::Phylogenetics # Serialized phylogenetic tree models based on BioRuby's phylogenetic tree implementation. class CDAORDFWriter < BioInterchange::Writer # Register writers: BioInterchange::Registry.register_writer( 'rdf.phylotastic.newick', CDAORDFWriter, [ 'phylotastic.newick' ], true, 'Comparative Data Analysis Ontology (CDAO) based RDFization' ) # Creates a new instance of a CDAORDFWriter that will use the provided output stream to serialize RDF. # # +ostream+:: instance of an IO class or derivative that is used for RDF serialization def initialize(ostream) @ostream = ostream end # Serialize a model as RDF. # # +model+:: a generic representation of input data that is an instance of BioInterchange::Phylogenetics::TreeSet def serialize(model) model.contents.each { |tree| serialize_model(model, tree) } end protected # ...omitted actual serialization implementation.
Vocabulary wrappers in Python are available as an egg, that can be installed via easy_install:
sudo easy_install rdflib sudo easy_install http://www.biointerchange.org/eggs/biointerchange-1.0.0-py2.7.egg
Usage example (see also example.py):
import biointerchange from biointerchange import * from rdflib.namespace import Namespace def print_resource(resource): print " " + resource print " Ontology class: " + str(GFF3O.is_class(resource)) print " Ontology object property: " + str(GFF3O.is_object_property(resource)) print " Ontology datatype property: " + str(GFF3O.is_datatype_property(resource)) # Get the URI of an ontology term by label: print "'seqid' property:" print_resource(GFF3O.seqid()) # Ambiguous labels will return an array of URIs: # "start" can refer to a sub-property of "feature_properties" or "target_properties" print "'start' properties:" for start_synonym in GFF3O.start(): print_resource(start_synonym) # "feature_properties" can be either a datatype or object property print "'feature_properties' properties:" for feature_properties_synonym in GFF3O.feature_properties(): print_resource(feature_properties_synonym) # Use build-in method "is_datatype_property" to resolve ambiguity: # (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.) feature_properties = filter(lambda uri: GFF3O.is_datatype_property(uri), GFF3O.feature_properties()) print "'feature_properties' properties, which are a datatype property:" for feature_property in feature_properties: print_resource(feature_property) # Use build-in method "with_parent" to pick properties based on their context: print "'start' property with parent datatype property 'feature_properties':" for feature_property in GFF3O.with_parent(GFF3O.start(), feature_properties[0]): print_resource(feature_property)
The example can be executed on the command line via:
git clone git://github.com/BioInterchange/BioInterchange.git cd BioInterchange git checkout v1.0.0 cd supplemental/python python example.py
Vocabulary wrappers in Java are available as a Maven artifact. Add the following repository and dependency setting to your Project Object Model (POM) file:
<repositories> <repository> <id>biointerchange</id> <name>BioInterchange</name> <url>http://www.biointerchange.org/artifacts</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.biointerchange</groupId> <artifactId>vocabularies</artifactId> <version>1.0.0</version> </dependency> </dependencies>
Usage example (see also App.java):
package org.biointerchange; import com.hp.hpl.jena.rdf.model.*; import com.hp.hpl.jena.vocabulary.*; import org.apache.commons.collections.CollectionUtils; import org.apache.commons.collections.Predicate; import java.util.Set; import org.biointerchange.vocabulary.*; /** * Demo on how to make use of BioInterchange's vocabulary classes. * * @author Joachim Baran */ public class App { public static void main(String[] args) { Resource seqid = GFF3O.seqid(); System.out.println("'seqid' property:"); printResource(seqid); System.out.println("'start' properties:"); Setstart = GFF3O.start(); for (Resource startSynonym : start) printResource(startSynonym); System.out.println("'feature_properties' properties:"); Set featureProperties = GFF3O.feature_properties(); for (Resource featurePropertiesSynonym : featureProperties) printResource(featurePropertiesSynonym); System.out.println("'feature_properties' properties, which are a datatype property:"); CollectionUtils.filter(featureProperties, new Predicate() { public boolean evaluate(Object o) { return GFF3O.isDatatypeProperty((Resource)o); } }); for (Resource featurePropertiesSynonym : featureProperties) printResource(featurePropertiesSynonym); System.out.println("'start' property with parent datatype property 'feature_properties':"); Set startUnderDatatypeFeatureProperties = GFF3O.withParent(start, featureProperties.iterator().next()); for (Resource startSynonym : startUnderDatatypeFeatureProperties) printResource(startSynonym); } private static void printResource(Resource resource) { System.out.println(" " + resource.toString()); System.out.println(" Namespace: " + resource.getNameSpace()); System.out.println(" Local name: " + resource.getLocalName()); System.out.println(" Jena Property (rather than Resource): " + (resource instanceof Property)); System.out.println(" Ontology class: " + GFF3O.isClass(resource)); System.out.println(" Ontology object property: " + GFF3O.isObjectProperty(resource)); System.out.println(" Ontology datatype property: " + GFF3O.isDatatypeProperty(resource)); } }
Another example that uses SIO instead of GFF3O is provided as AppSIO.java.
The examples can be executed through Maven:
git clone git://github.com/BioInterchange/BioInterchange.git cd BioInterchange git checkout v1.0.0 cd supplemental/java/biointerchange mvn compile mvn exec:java -Dexec.mainClass="org.biointerchange.App" mvn exec:java -Dexec.mainClass="org.biointerchange.AppSIO"