class Ensembl::Core::Transcript
The Transcript
class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene
to a SeqRegion
.
This class uses ActiveRecord
to access data in the Ensembl
database. See the general documentation of the Ensembl
module for more information on what this means and what methods are available.
This class includes the mixin Sliceable
, which means that it is mapped to a SeqRegion
object and a Slice
can be created for objects of this class. See Sliceable
and Slice
for more information.
@example
#TODO
Public Class Methods
The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.
# File lib/bio-ensembl/core/transcript.rb, line 142 def self.find_all_by_stable_id(stable_id) answer = Array.new transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id) transcript_stable_id_objects.each do |transcript_stable_id_object| answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id)) end return answer end
The Transcript#find_all_by_stable_id class method returns a transcripts with the given stable_id. If none was found, nil is returned.
# File lib/bio-ensembl/core/transcript.rb, line 154 def self.find_by_stable_id(stable_id) all = self.find_all_by_stable_id(stable_id) if all.length == 0 return nil else return all[0] end end
Public Instance Methods
The Transcript#cdna2genomic
method converts cDNA coordinates to genomic coordinates for this transcript.
@param [Integer] pos Position on the cDNA @return [Integer] Position on the genomic DNA
# File lib/bio-ensembl/core/transcript.rb, line 318 def cdna2genomic(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_cdna_position(pos) accumulated_position = 0 ex = self.exons.sort_by {|e| e.seq_region_start} ex.reverse! if self.strand == -1 ex.each do |exon| if exon == exon_with_target length_to_be_taken_from_exon = pos - (accumulated_position + 1) if self.strand == -1 return exon.seq_region_end - length_to_be_taken_from_exon else return exon.seq_region_start + length_to_be_taken_from_exon end else accumulated_position += exon.length end end end
The Transcript#cds2genomic
method converts CDS coordinates to genomic coordinates for this transcript.
@param [Integer] pos Position on the CDS @return [Integer] Position on the genomic DNA
# File lib/bio-ensembl/core/transcript.rb, line 345 def cds2genomic(pos) return self.cdna2genomic(pos + self.coding_region_cdna_start) end
The Transcript#cds_seq
method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.
# File lib/bio-ensembl/core/transcript.rb, line 189 def cds_seq cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1 return self.seq[(self.coding_region_cdna_start - 1), cds_length] end
The Transcript#coding_region_cdna_end
returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end
, the CDS start position is always at the border of the 3'UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ''right'' of the CDS stop position.
# File lib/bio-ensembl/core/transcript.rb, line 270 def coding_region_cdna_end answer = 0 self.exons.each do |exon| if exon == self.translation.end_exon answer += self.translation.seq_end return answer else answer += exon.length end end end
The Transcript#coding_region_cdna_start
returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start
, the CDS start position is always at the border of the 5'UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ''right'' of the CDS stop position.
# File lib/bio-ensembl/core/transcript.rb, line 250 def coding_region_cdna_start answer = 0 self.exons.each do |exon| if exon == self.translation.start_exon answer += self.translation.seq_start return answer else answer += exon.length end end end
The Transcript#coding_region_genomic_end
returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end
, the CDS stop position is always ''right'' of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5'UTR instead of the 3'UTR.
# File lib/bio-ensembl/core/transcript.rb, line 235 def coding_region_genomic_end strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 ) else return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 ) end end
The Transcript#coding_region_genomic_start
returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start
, the CDS start position is always ''left'' of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3'UTR instead of the 5'UTR.
# File lib/bio-ensembl/core/transcript.rb, line 220 def coding_region_genomic_start strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 ) else return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 ) end end
The Transcript#display_label
method returns the default name of the transcript.
# File lib/bio-ensembl/core/transcript.rb, line 132 def display_label return Xref.find(self.display_xref_id).display_label end
The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
# File lib/bio-ensembl/core/transcript.rb, line 300 def exon_for_cdna_position(pos) # FIXME: Still have to check for when pos is outside of scope of cDNA. accumulated_exon_length = 0 self.exons.each do |exon| accumulated_exon_length += exon.length if accumulated_exon_length > pos return exon end end raise RuntimeError, "Position outside of cDNA scope" end
The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.
# File lib/bio-ensembl/core/transcript.rb, line 286 def exon_for_genomic_position(pos) if pos < self.seq_region_start or pos > self.seq_region_end raise RuntimeError, "Position has to be within transcript" end self.exons.each do |exon| if exon.start <= pos and exon.stop >= pos return exon end end return nil end
The Transcript#five_prime_utr_seq
method returns the sequence of the 5'UTR of the transcript.
# File lib/bio-ensembl/core/transcript.rb, line 197 def five_prime_utr_seq return self.seq[0, self.coding_region_cdna_start - 1] end
The Transcript#genomic2cdna
method converts genomic coordinates to cDNA coordinates for this transcript.
@param [Integer] pos Position on the genomic DNA @return [Integer] Position on the cDNA
# File lib/bio-ensembl/core/transcript.rb, line 363 def genomic2cdna(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_genomic_position(pos) accumulated_position = 0 ex = self.exons.sort_by {|e| e.seq_region_start} ex.reverse! if self.strand == -1 ex.each do |exon| if exon.stable_id == exon_with_target.stable_id if self.strand == 1 accumulated_position += ( pos - exon.start) +1 else accumulated_position += ( exon.stop - pos ) +1 end return accumulated_position else accumulated_position += exon.length end end return RuntimeError, "Position outside of cDNA scope" end
The Transcript#genomic2cds
method converts genomic coordinates to CDS coordinates for this transcript.
@param [Integer] pos Position on the genomic DNA @return [Integer] Position on the CDS
# File lib/bio-ensembl/core/transcript.rb, line 391 def genomic2cds(pos) return self.genomic2cdna(pos) - self.coding_region_cdna_start end
The Transcript#genomic2pep
method converts genomic coordinates to peptide coordinates for this transcript.
@param [Integer] pos Base position on the genomic DNA @return [Integer] Aminoacid position in the protein Arguments:
- pos
-
position on the chromosome (required)
- Returns
# File lib/bio-ensembl/core/transcript.rb, line 403 def genomic2pep(pos) raise NotImplementedError end
The Transcript#introns
methods returns the introns for this transcript
@return [Array<Intron>] Sorted array of Intron
objects
# File lib/bio-ensembl/core/transcript.rb, line 111 def introns if @introns.nil? @introns = Array.new if self.exons.length > 1 self.exons.each_with_index do |exon, index| next if index == 0 @introns.push(Intron.new(self.exons[index - 1], exon)) end end end return @introns end
The Transcript#pep2genomic
method converts peptide coordinates to genomic coordinates for this transcript.
@param [Integer] pos Aminoacid position on the protein @return [Integer] Position on the genomic DNA
# File lib/bio-ensembl/core/transcript.rb, line 354 def pep2genomic(pos) raise NotImplementedError end
The Transcript#protein_seq
method returns the sequence of the protein of the transcript.
# File lib/bio-ensembl/core/transcript.rb, line 209 def protein_seq return Bio::Sequence::NA.new(self.cds_seq).translate.seq end
The Transcript#seq
method returns the full sequence of all concatenated exons.
# File lib/bio-ensembl/core/transcript.rb, line 177 def seq if @seq.nil? @seq = '' self.exons.each do |exon| @seq += exon.seq end end return @seq end
The Transcript#stable_id
method returns the stable ID of the transcript.
@return [String] Ensembl
stable ID of the transcript.
# File lib/bio-ensembl/core/transcript.rb, line 127 def stable_id return self.transcript_stable_id.stable_id end
The Transcript#three_prime_utr_seq
method returns the sequence of the 3'UTR of the transcript.
# File lib/bio-ensembl/core/transcript.rb, line 203 def three_prime_utr_seq return self.seq[self.coding_region_cdna_end..-1] end