module Ensembl

#

bio-ensembl.rb

Copyright

Copyright (C) 2009 Jan Aerts <jandot.myopenid.com> Francesco Strozzi <francesco.strozzi@gmail.com>

License

The Ruby License

@author Jan Aerts @author Francesco Strozzi

What is it?

The Ensembl module provides an API to the Ensembl databases stored at ensembldb.ensembl.org. This is the same information that is available from www.ensembl.org.

The Ensembl::Core module mainly covers sequences and annotations. The Ensembl::Variation module covers variations (e.g. SNPs). The Ensembl::Compara module covers comparative mappings between species.

ActiveRecord

The Ensembl API provides a ruby interface to the Ensembl mysql databases at ensembldb.ensembl.org. Most of the API is based on ActiveRecord to get data from that database. In general, each table is described by a class with the same name: the coord_system table is covered by the CoordSystem class, the seq_region table is covered by the SeqRegion class, etc. As a result, accessors are available for all columns in each table. For example, the seq_region table has the following columns: seq_region_id, name, coord_system_id and length. Through ActiveRecord, these column names become available as attributes of SeqRegion objects:

puts my_seq_region.seq_region_id
puts my_seq_region.name
puts my_seq_region.coord_system_id
puts my_seq_region.length.to_s

ActiveRecord makes it easy to extract data from those tables using the collection of find methods. There are three types of find methods (e.g. for the CoordSystem class):

  1. find based on primary key in table:

my_coord_system = CoordSystem.find(5)
  1. find_by_sql:

my_coord_system = CoordSystem.find_by_sql('SELECT * FROM coord_system WHERE name = 'chromosome'")
  1. find_by_<insert_your_column_name_here>

my_coord_system1 = CoordSystem.find_by_name('chromosome')
my_coord_system2 = CoordSystem.find_by_rank(3)

To find out which find_by_<column> methods are available, you can list the column names using the column_names class methods:

puts Ensembl::Core::CoordSystem.column_names.join("\t")

For more information on the find methods, see ar.rubyonrails.org/classes/ActiveRecord/Base.html#M000344

The relationships between different tables are accessible through the classes as well. For example, to loop over all seq_regions belonging to a coord_system (a coord_system “has many” seq_regions):

chr_coord_system = CoordSystem.find_by_name('chromosome')
chr_coord_system.seq_regions.each do |seq_region|
  puts seq_region.name
end

Of course, you can go the other way as well (a seq_region “belongs to” a coord_system):

chr4 = SeqRegion.find_by_name('4')
puts chr4.coord_system.name  #--> 'chromosome'

To find out what relationships exist for a given class, you can use the reflect_on_all_associations class methods:

puts SeqRegion.reflect_on_all_associations(:has_many).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:has_one).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:belongs_to).collect{|a| a.name.to_s}.join("\n")

ensembl/core/collection.rb

Copyright

Copyright (C) 2009 Francesco Strozzi <francesco.strozzi@gmail.com>

License

The Ruby License

@author Francesco Strozzi

ensembl/core/project.rb - project calculations for Ensembl Slice

Copyright

Copyright (C) 2009 Jan Aerts <jandot.myopenid.com> Francesco Strozzi <francesco.strozzi@gmail.com>

License

The Ruby License

@author Jan Aerts @author Francesco Strozzi

ensembl/variation/variation.rb - Extension of ActiveRecord classes for Ensembl variation features

Copyright

Copyright (C) 2008 Francesco Strozzi <francesco.strozzi@gmail.com>

License

The Ruby License

@author Francesco Strozzi

ensembl/variation/variation.rb - Extension of ActiveRecord classes for Ensembl variation features

Copyright

Copyright (C) 2008 Francesco Strozzi <francesco.strozzi@gmail.com>

License

The Ruby License

@author Francesco Strozzi

Constants

DB_ADAPTER
DB_HOST
DB_PASSWORD
DB_USERNAME
EG_HOST
EG_PORT
ENSEMBL_RELEASE
SESSION