class Ensembl::Core::Transcript

The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.

This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.

This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.

@example

#TODO

Public Class Methods

find_all_by_stable_id(stable_id) click to toggle source

The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.

# File lib/bio-ensembl/core/transcript.rb, line 142
def self.find_all_by_stable_id(stable_id)
  answer = Array.new
  transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id)
  transcript_stable_id_objects.each do |transcript_stable_id_object|
    answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id))
  end

  return answer
end
find_by_stable_id(stable_id) click to toggle source

The Transcript#find_all_by_stable_id class method returns a transcripts with the given stable_id. If none was found, nil is returned.

# File lib/bio-ensembl/core/transcript.rb, line 154
def self.find_by_stable_id(stable_id)
  all = self.find_all_by_stable_id(stable_id)
  if all.length == 0
    return nil
  else
    return all[0]
  end
end

Public Instance Methods

cdna2genomic(pos) click to toggle source

The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.

@param [Integer] pos Position on the cDNA @return [Integer] Position on the genomic DNA

# File lib/bio-ensembl/core/transcript.rb, line 318
def cdna2genomic(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_cdna_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|  
    if exon == exon_with_target
      length_to_be_taken_from_exon = pos - (accumulated_position + 1)
      if self.strand == -1
        return exon.seq_region_end - length_to_be_taken_from_exon
      else
        return exon.seq_region_start + length_to_be_taken_from_exon
      end
    else
      accumulated_position += exon.length 
    end
  end
end
cds2genomic(pos) click to toggle source

The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.

@param [Integer] pos Position on the CDS @return [Integer] Position on the genomic DNA

# File lib/bio-ensembl/core/transcript.rb, line 345
def cds2genomic(pos)
  return self.cdna2genomic(pos + self.coding_region_cdna_start)
end
cds_seq() click to toggle source

The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.

# File lib/bio-ensembl/core/transcript.rb, line 189
def cds_seq
  cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1
  
  return self.seq[(self.coding_region_cdna_start - 1), cds_length]
end
coding_region_cdna_end() click to toggle source

The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3'UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ''right'' of the CDS stop position.

# File lib/bio-ensembl/core/transcript.rb, line 270
def coding_region_cdna_end
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.end_exon
      answer += self.translation.seq_end
      return answer
    else
      answer += exon.length
    end
  end
end
coding_region_cdna_start() click to toggle source

The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5'UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ''right'' of the CDS stop position.

# File lib/bio-ensembl/core/transcript.rb, line 250
def coding_region_cdna_start
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.start_exon
      answer += self.translation.seq_start
      return answer
    else
      answer += exon.length
    end
  end
  
end
coding_region_genomic_end() click to toggle source

The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ''right'' of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5'UTR instead of the 3'UTR.

# File lib/bio-ensembl/core/transcript.rb, line 235
def coding_region_genomic_end
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 )
  else
    return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 )
  end
end
coding_region_genomic_start() click to toggle source

The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ''left'' of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3'UTR instead of the 5'UTR.

# File lib/bio-ensembl/core/transcript.rb, line 220
def coding_region_genomic_start
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 )
  else
    return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 )
  end
end
display_label() click to toggle source

The Transcript#display_label method returns the default name of the transcript.

# File lib/bio-ensembl/core/transcript.rb, line 132
def display_label
  return Xref.find(self.display_xref_id).display_label
end
Also aliased as: display_name, label, name
display_name()
Alias for: display_label
exon_for_cdna_position(pos) click to toggle source

The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.

# File lib/bio-ensembl/core/transcript.rb, line 300
def exon_for_cdna_position(pos)
  # FIXME: Still have to check for when pos is outside of scope of cDNA.
  accumulated_exon_length = 0
  
  self.exons.each do |exon|
    accumulated_exon_length += exon.length
    if accumulated_exon_length > pos
      return exon
    end
  end
  raise RuntimeError, "Position outside of cDNA scope"
end
exon_for_genomic_position(pos) click to toggle source

The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.

# File lib/bio-ensembl/core/transcript.rb, line 286
def exon_for_genomic_position(pos)
  if pos < self.seq_region_start or pos > self.seq_region_end
    raise RuntimeError, "Position has to be within transcript"
  end
  self.exons.each do |exon|
    if exon.start <= pos and exon.stop >= pos
      return exon
    end
  end
  return nil
end
five_prime_utr_seq() click to toggle source

The Transcript#five_prime_utr_seq method returns the sequence of the 5'UTR of the transcript.

# File lib/bio-ensembl/core/transcript.rb, line 197
def five_prime_utr_seq
  return self.seq[0, self.coding_region_cdna_start - 1]
end
genomic2cdna(pos) click to toggle source

The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.

@param [Integer] pos Position on the genomic DNA @return [Integer] Position on the cDNA

# File lib/bio-ensembl/core/transcript.rb, line 363
def genomic2cdna(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_genomic_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|
    if exon.stable_id == exon_with_target.stable_id
      if self.strand == 1
        accumulated_position += ( pos - exon.start) +1
      else
        accumulated_position += ( exon.stop - pos ) +1
      end  
      return accumulated_position
    else
        accumulated_position += exon.length 
    end
  end
  return RuntimeError, "Position outside of cDNA scope"
end
genomic2cds(pos) click to toggle source

The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.

@param [Integer] pos Position on the genomic DNA @return [Integer] Position on the CDS

# File lib/bio-ensembl/core/transcript.rb, line 391
def genomic2cds(pos)
  return self.genomic2cdna(pos) - self.coding_region_cdna_start
end
genomic2pep(pos) click to toggle source

The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.

@param [Integer] pos Base position on the genomic DNA @return [Integer] Aminoacid position in the protein Arguments:

  • pos

    position on the chromosome (required)

Returns
# File lib/bio-ensembl/core/transcript.rb, line 403
def genomic2pep(pos)
  raise NotImplementedError
end
introns() click to toggle source

The Transcript#introns methods returns the introns for this transcript

@return [Array<Intron>] Sorted array of Intron objects

# File lib/bio-ensembl/core/transcript.rb, line 111
def introns
  if @introns.nil?
    @introns = Array.new
    if self.exons.length > 1
      self.exons.each_with_index do |exon, index|
        next if index == 0
        @introns.push(Intron.new(self.exons[index - 1], exon))
      end
    end
  end
  return @introns
end
label()
Alias for: display_label
name()
Alias for: display_label
pep2genomic(pos) click to toggle source

The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.

@param [Integer] pos Aminoacid position on the protein @return [Integer] Position on the genomic DNA

# File lib/bio-ensembl/core/transcript.rb, line 354
def pep2genomic(pos)
  raise NotImplementedError
end
protein_seq() click to toggle source

The Transcript#protein_seq method returns the sequence of the protein of the transcript.

# File lib/bio-ensembl/core/transcript.rb, line 209
def protein_seq
  return Bio::Sequence::NA.new(self.cds_seq).translate.seq
end
seq() click to toggle source

The Transcript#seq method returns the full sequence of all concatenated exons.

# File lib/bio-ensembl/core/transcript.rb, line 177
def seq
  if @seq.nil?
    @seq = ''
    self.exons.each do |exon|
      @seq += exon.seq
    end
  end
  return @seq
end
stable_id() click to toggle source

The Transcript#stable_id method returns the stable ID of the transcript.

@return [String] Ensembl stable ID of the transcript.

# File lib/bio-ensembl/core/transcript.rb, line 127
def stable_id
  return self.transcript_stable_id.stable_id
end
three_prime_utr_seq() click to toggle source

The Transcript#three_prime_utr_seq method returns the sequence of the 3'UTR of the transcript.

# File lib/bio-ensembl/core/transcript.rb, line 203
def three_prime_utr_seq
  return self.seq[self.coding_region_cdna_end..-1]
end