class Bio::Locations

Description

The Bio::Locations class is a container for Bio::Location objects: creating a Bio::Locations object (based on a GenBank style position string) will spawn an array of Bio::Location objects.

Usage

 locations = Bio::Locations.new('join(complement(500..550), 600..625)')
 locations.each do |loc|
   puts "class = " + loc.class.to_s
   puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})"
 end
 # Output would be:
 #   class = Bio::Location
 #   range = 500..550 (strand = -1)
 #   class = Bio::Location
 #   range = 600..625 (strand = 1)

# For the following three location strings, print the span and range
['one-of(898,900)..983',
 'one-of(5971..6308,5971..6309)',
 '8050..one-of(10731,10758,10905,11242)'].each do |loc|
    location = Bio::Locations.new(loc)
    puts location.span
    puts location.range
end

GenBank location descriptor classification

Definition of the position notation of the GenBank location format

According to the GenBank manual 'gbrel.txt', position notations were classified into 10 patterns - (A) to (J).

3.4.12.2 Feature Location

  The second column of the feature descriptor line designates the
location of the feature in the sequence. The location descriptor
begins at position 22. Several conventions are used to indicate
sequence location.

  Base numbers in location descriptors refer to numbering in the entry,
which is not necessarily the same as the numbering scheme used in the
published report. The first base in the presented sequence is numbered
base 1. Sequences are presented in the 5 to 3 direction.

Location descriptors can be one of the following:

(A) 1. A single base;

(B) 2. A contiguous span of bases;

(C) 3. A site between two bases;

(D) 4. A single base chosen from a range of bases;

(E) 5. A single base chosen from among two or more specified bases;

(F) 6. A joining of sequence spans;

(G) 7. A reference to an entry other than the one to which the feature
     belongs (i.e., a remote entry), followed by a location descriptor
     referring to the remote sequence;

(H) 8. A literal sequence (a string of bases enclosed in quotation marks).

Description commented with pattern IDs.

(C)   A site between two residues, such as an endonuclease cleavage site, is
    indicated by listing the two bases separated by a carat (e.g., 23^24).

(D)   A single residue chosen from a range of residues is indicated by the
    number of the first and last bases in the range separated by a single
    period (e.g., 23.79). The symbols < and > indicate that the end point
(I) of the range is beyond the specified base number.

(B)   A contiguous span of bases is indicated by the number of the first and
    last bases in the range separated by two periods (e.g., 23..79). The
(I) symbols < and > indicate that the end point of the range is beyond the
    specified base number. Starting and ending positions can be indicated
    by base number or by one of the operators described below.

      Operators are prefixes that specify what must be done to the indicated
    sequence to locate the feature. The following are the operators
    available, along with their most common format and a description.

(J) complement (location): The feature is complementary to the location
    indicated. Complementary strands are read 5 to 3.

(F) join (location, location, .. location): The indicated elements should
    be placed end to end to form one contiguous sequence.

(F) order (location, location, .. location): The elements are found in the
    specified order in the 5 to 3 direction, but nothing is implied about
    the rationality of joining them.

(F) group (location, location, .. location): The elements are related and
    should be grouped together, but no order is implied.

(E) one-of (location, location, .. location): The element can be any one,
  but only one, of the items listed.

Reduction strategy of the position notations

Attributes

locations[RW]

(Array) An Array of Bio::Location objects

operator[RW]

(Symbol or nil) Operator. nil (means :join), :order, or :group (obsolete).

Public Class Methods

new(position) click to toggle source

Parses a GenBank style position string and returns a Bio::Locations object, which contains a list of Bio::Location objects.

locations = Bio::Locations.new('join(complement(500..550), 600..625)')

Arguments:

  • (required) str: GenBank style position string

Returns

Bio::Locations object

    # File lib/bio/location.rb
346 def initialize(position)
347   @operator = nil
348   if position.is_a? Array
349     @locations = position
350   else
351     position   = gbl_cleanup(position)        # preprocessing
352     @locations = gbl_pos2loc(position)        # create an Array of Bio::Location objects
353   end
354 end

Public Instance Methods

==(other) click to toggle source

If other is equal with the self, returns true. Otherwise, returns false.


Arguments:

  • (required) other: any object

Returns

true or false

Calls superclass method
    # File lib/bio/location.rb
381 def ==(other)
382   return true if super(other)
383   return false unless other.instance_of?(self.class)
384   if self.locations == other.locations and
385       self.operator == other.operator then
386     true
387   else
388     false
389   end
390 end
[](n) click to toggle source

Returns nth Bio::Location object.

    # File lib/bio/location.rb
400 def [](n)
401   @locations[n]
402 end
absolute(n, type = nil) click to toggle source

Converts relative position in the locus to position in the whole of the DNA sequence.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ':aa'-flag returns the position of the associated amino-acid rather than the nucleotide.

loc = Bio::Locations.new('complement(12838..13533)')
puts loc.absolute(10)          # => 13524
puts loc.absolute(10, :aa)     # => 13506

Arguments:

  • (required) position: nucleotide position within locus

  • :aa: flag to be used if position is a aminoacid position rather than a nucleotide position

Returns

position within the whole of the sequence

    # File lib/bio/location.rb
490 def absolute(n, type = nil)
491   case type
492   when :location
493     ;
494   when :aa
495     n = (n - 1) * 3 + 1
496     rel2abs(n)
497   else
498     rel2abs(n)
499   end
500 end
each() { |x| ... } click to toggle source

Iterates on each Bio::Location object.

    # File lib/bio/location.rb
393 def each
394   @locations.each do |x|
395     yield(x)
396   end
397 end
equals?(other) click to toggle source

Evaluate equality of Bio::Locations object.

    # File lib/bio/location.rb
364 def equals?(other)
365   if ! other.kind_of?(Bio::Locations)
366     return nil
367   end
368   if self.sort == other.sort
369     return true
370   else
371     return false
372   end
373 end
first() click to toggle source

Returns first Bio::Location object.

    # File lib/bio/location.rb
405 def first
406   @locations.first
407 end
last() click to toggle source

Returns last Bio::Location object.

    # File lib/bio/location.rb
410 def last
411   @locations.last
412 end
length() click to toggle source

Returns a length of the spliced RNA.

    # File lib/bio/location.rb
429 def length
430   len = 0
431   @locations.each do |x|
432     if x.sequence
433       len += x.sequence.size
434     else
435       len += (x.to - x.from + 1)
436     end
437   end
438   len
439 end
Also aliased as: size
range() click to toggle source

Similar to span, but returns a Range object min..max

    # File lib/bio/location.rb
423 def range
424   min, max = span
425   min..max
426 end
relative(n, type = nil) click to toggle source

Converts absolute position in the whole of the DNA sequence to relative position in the locus.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ':aa'-flag returns the position of the associated amino-acid rather than the nucleotide.

loc = Bio::Locations.new('complement(12838..13533)')
puts loc.relative(13524)        # => 10
puts loc.relative(13506, :aa)   # => 3

Arguments:

  • (required) position: nucleotide position within whole of the sequence

  • :aa: flag that lets method return position in aminoacid coordinates

Returns

position within the location

    # File lib/bio/location.rb
458 def relative(n, type = nil)
459   case type
460   when :location
461     ;
462   when :aa
463     if n = abs2rel(n)
464       (n - 1) / 3 + 1
465     else
466       nil
467     end
468   else
469     abs2rel(n)
470   end
471 end
size()
Alias for: length
span() click to toggle source

Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.

    # File lib/bio/location.rb
416 def span
417   span_min = @locations.min { |a,b| a.from <=> b.from }
418   span_max = @locations.max { |a,b| a.to   <=> b.to   }
419   return span_min.from, span_max.to
420 end
to_s() click to toggle source

String representation.

Note: In some cases, it fails to detect whether “complement(join(…))” or “join(complement(..))”, and whether “complement(order(…))” or “order(complement(..))”.


Returns

String

    # File lib/bio/location.rb
511 def to_s
512   return '' if @locations.empty?
513   complement_join = false
514   locs = @locations
515   if locs.size >= 2 and locs.inject(true) do |flag, loc|
516       # check if each location is complement
517       (flag && (loc.strand == -1) && !loc.xref_id)
518     end and locs.inject(locs[0].from) do |pos, loc|
519       if pos then
520         (pos >= loc.from) ? loc.from : false
521       else
522         false
523       end
524     end then
525     locs = locs.reverse
526     complement_join = true
527   end
528   locs = locs.collect do |loc|
529     lt = loc.lt ? '<' : ''
530     gt = loc.gt ? '>' : ''
531     str = if loc.from == loc.to then
532             "#{lt}#{gt}#{loc.from.to_i}"
533           elsif loc.carat then
534             "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}"
535           else
536             "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}"
537           end
538     if loc.xref_id and !loc.xref_id.empty? then
539       str = "#{loc.xref_id}:#{str}"
540     end
541     if loc.strand == -1 and !complement_join then
542       str = "complement(#{str})"
543     end
544     if loc.sequence then
545       str = "replace(#{str},\"#{loc.sequence}\")"
546     end
547     str
548   end
549   if locs.size >= 2 then
550     op = (self.operator || 'join').to_s
551     result = "#{op}(#{locs.join(',')})"
552   else
553     result = locs[0]
554   end
555   if complement_join then
556     result = "complement(#{result})"
557   end
558   result
559 end

Private Instance Methods

abs2rel(n) click to toggle source

Convert the absolute position to the relative position

    # File lib/bio/location.rb
684 def abs2rel(n)
685   return nil unless n > 0                     # out of range
686 
687   cursor = 0
688   @locations.each do |x|
689     if x.sequence
690       len = x.sequence.size
691     else
692       len = x.to - x.from + 1
693     end
694     if n < x.from or n > x.to then
695       cursor += len
696     else
697       if x.strand < 0 then
698         return x.to - (n - cursor - 1)
699       else
700         return n + cursor + 1 - x.from
701       end
702     end
703   end
704   return nil                                  # out of range
705 end
gbl_cleanup(position) click to toggle source

Preprocessing to clean up the position notation.

    # File lib/bio/location.rb
565 def gbl_cleanup(position)
566   # sometimes position contains white spaces...
567   position = position.gsub(/\s+/, '')
568 
569   # select one base                                   # (D) n.m
570   #               ..         n          m           :
571   #     <match>   $1       ( $2         $3       not   )
572   position.gsub!(/(\.{2})?\(?([<>\d]+)\.([<>\d]+)(?!:)\)?/) do |match|
573     if $1
574       $1 + $3                                         # ..(n.m)  => ..m
575     else
576       $2                                              # (?n.m)?  => n
577     end
578   end
579 
580   # select the 1st location                           # (E) one-of()
581   #     <match>   ..      one-of ($2     ,$3      )
582   position.gsub!(/(\.{2})?one-of\(([^,]+),([^)]+)\)/) do |match|
583     if $1
584       $1 + $3.gsub(/.*,(.*)/, '\1')                   # ..one-of(n,m)  => ..m
585     else
586       $2                                              # one-of(n,m)    => n
587     end
588   end
589 
590   ## substitute order(), group() by join()            # (F) group(), order()
591   #position.gsub!(/(order|group)/, 'join')
592 
593   return position
594 end
gbl_pos2loc(position) click to toggle source

Parse position notation and create Location objects.

    # File lib/bio/location.rb
598 def gbl_pos2loc(position)
599   ary = []
600 
601   case position
602 
603   when /^(join|order|group)\((.*)\)$/                         # (F) join()
604     if $1 != "join" then
605       @operator = $1.intern
606     end
607     position = $2
608 
609     join_list = []            # sub positions to join
610     bracket   = []            # position with bracket
611     s_count   = 0             # stack counter
612 
613     position.split(',').each do |sub_pos|
614       case sub_pos
615       when /\(.*\)/
616         join_list << sub_pos
617       when /\(/
618         s_count += 1
619         bracket << sub_pos
620       when /\)/
621         s_count -= 1
622         bracket << sub_pos
623         if s_count == 0
624           join_list << bracket.join(',')
625         end
626       else
627         if s_count == 0
628           join_list << sub_pos
629         else
630           bracket << sub_pos
631         end
632       end
633     end
634 
635     join_list.each do |pos|
636       ary << gbl_pos2loc(pos)
637     end
638 
639   when /^complement\((.*)\)$/                         # (J) complement()
640     position =       $1
641     gbl_pos2loc(position).reverse_each do |location|
642       ary << location.complement
643     end
644 
645   when /^replace\(([^,]+),"?([^"]*)"?\)/              # (K) replace()
646     position =    $1
647     sequence =              $2
648     ary << gbl_pos2loc(position).first.replace(sequence)
649 
650   else                                                # (A, B, C, G, H, I)
651     ary << Location.new(position)
652 
653   end
654 
655   return ary.flatten
656 end
rel2abs(n) click to toggle source

Convert the relative position to the absolute position

    # File lib/bio/location.rb
660 def rel2abs(n) 
661   return nil unless n > 0                     # out of range
662 
663   cursor = 0 
664   @locations.each do |x|      
665     if x.sequence 
666       len = x.sequence.size 
667     else 
668       len = x.to - x.from + 1 
669     end  
670     if n > cursor + len 
671       cursor += len 
672     else 
673       if x.strand < 0 
674         return x.to - (n - cursor - 1) 
675       else 
676         return x.from + (n - cursor - 1) 
677       end 
678     end                             
679   end 
680   return nil                                  # out of range
681 end