class Bio::Nexus

DESCRIPTION

Bio::Nexus is a parser for nexus formatted data. It contains classes and constants enabling the representation and processing of nexus data.

USAGE

# Parsing a nexus formatted string str:
nexus = Bio::Nexus.new( nexus_str )

# Obtaining of the nexus blocks as array of GenericBlock or
# any of its subclasses (such as DistancesBlock):
blocks = nexus.get_blocks

# Getting a block by name:
my_blocks = nexus.get_blocks_by_name( "my_block" )

# Getting distance blocks:
distances_blocks = nexus.get_distances_blocks

# Getting trees blocks:
trees_blocks = nexus.get_trees_blocks

# Getting data blocks:
data_blocks = nexus.get_data_blocks

# Getting characters blocks:
character_blocks = nexus.get_characters_blocks

# Getting taxa blocks:
taxa_blocks = nexus.get_taxa_blocks

Constants

BEGIN_BLOCK
BEGIN_COMMENT
BEGIN_NEXUS
CHARACTERS
CHARACTERS_BLOCK
DATA
DATATYPE
DATA_BLOCK
DELIMITER
DIMENSIONS
DISTANCES
DISTANCES_BLOCK
DOUBLE_QUOTE
END_BLOCK
END_COMMENT
END_OF_LINE
FORMAT
INDENTENTION
MATRIX
NCHAR
NTAX
SINGLE_QUOTE
TAXA
TAXA_BLOCK
TAXLABELS
TREES
TREES_BLOCK

Public Class Methods

new( nexus_str ) click to toggle source

Creates a new nexus parser for 'nexus_str'.


Arguments:

  • (required) nexus_str: String - nexus formatted data

    # File lib/bio/db/nexus.rb
177 def initialize( nexus_str )
178   @blocks             = Array.new
179   @current_cmd        = nil
180   @current_subcmd     = nil
181   @current_block_name = nil
182   @current_block      = nil
183   parse( nexus_str )
184 end

Public Instance Methods

get_blocks() click to toggle source

Returns an Array of all blocks found in the String 'nexus_str' set via Bio::Nexus.new( nexus_str ).


Returns

Array of GenericBlocks or any of its subclasses

    # File lib/bio/db/nexus.rb
192 def get_blocks
193   @blocks
194 end
get_blocks_by_name( name ) click to toggle source

A convenience methods which returns an array of all nexus blocks for which the name equals 'name' found in the String 'nexus_str' set via Bio::Nexus.new( nexus_str ).


Arguments:

  • (required) name: String

Returns

Array of GenericBlocks or any of its subclasses

    # File lib/bio/db/nexus.rb
204 def get_blocks_by_name( name )
205   found_blocks = Array.new
206   @blocks.each do | block |
207     if ( name == block.get_name )
208       found_blocks.push( block )
209     end
210   end
211   found_blocks
212 end
get_characters_blocks() click to toggle source

A convenience methods which returns an array of all characters blocks.


Returns

Array of CharactersBlocks

    # File lib/bio/db/nexus.rb
228 def get_characters_blocks
229   get_blocks_by_name( CHARACTERS_BLOCK.chomp( ";").downcase )
230 end
get_data_blocks() click to toggle source

A convenience methods which returns an array of all data blocks.


Returns

Array of DataBlocks

    # File lib/bio/db/nexus.rb
219 def get_data_blocks
220   get_blocks_by_name( DATA_BLOCK.chomp( ";").downcase )
221 end
get_distances_blocks() click to toggle source

A convenience methods which returns an array of all distances blocks.


Returns

Array of DistancesBlock

    # File lib/bio/db/nexus.rb
246 def get_distances_blocks
247   get_blocks_by_name( DISTANCES_BLOCK.chomp( ";").downcase )
248 end
get_taxa_blocks() click to toggle source

A convenience methods which returns an array of all taxa blocks.


Returns

Array of TaxaBlocks

    # File lib/bio/db/nexus.rb
255 def get_taxa_blocks
256   get_blocks_by_name( TAXA_BLOCK.chomp( ";").downcase )
257 end
get_trees_blocks() click to toggle source

A convenience methods which returns an array of all trees blocks.


Returns

Array of TreesBlocks

    # File lib/bio/db/nexus.rb
237 def get_trees_blocks
238   get_blocks_by_name( TREES_BLOCK.chomp( ";").downcase )
239 end
to_s() click to toggle source

Returns a String listing how many of each blocks it parsed.


Returns

String

    # File lib/bio/db/nexus.rb
263 def to_s
264   str = String.new
265   if get_blocks.length < 1
266     str << "empty"
267   else 
268     str << "number of blocks: " << get_blocks.length.to_s
269     if get_characters_blocks.length > 0
270       str << " [characters blocks: " << get_characters_blocks.length.to_s << "] "
271     end  
272     if get_data_blocks.length > 0
273       str << " [data blocks: " << get_data_blocks.length.to_s << "] "
274     end
275     if get_distances_blocks.length > 0
276       str << " [distances blocks: " << get_distances_blocks.length.to_s << "] "
277     end  
278     if get_taxa_blocks.length > 0
279       str << " [taxa blocks: " << get_taxa_blocks.length.to_s << "] "
280     end    
281     if get_trees_blocks.length > 0
282       str << " [trees blocks: " << get_trees_blocks.length.to_s << "] "
283     end        
284   end
285   str
286 end
Also aliased as: to_str
to_str()
Alias for: to_s

Private Instance Methods

add_token_to_matrix( token, scan_token, matrix, row, col ) click to toggle source

Helper method for make_matrix.


Arguments:

  • (required) token: String

  • (required) scan_token: true or false - add whole token

    or
    scan into chars
  • (required) matrix: NexusMatrix - the matrix to which to add token

  • (required) row: Integer - the row for matrix

  • (required) col: Integer - the starting row

Returns

Integer - ending row

    # File lib/bio/db/nexus.rb
686 def add_token_to_matrix( token, scan_token, matrix, row, col )
687   if ( scan_token )
688     token.scan(/./) { |w|
689     col += 1
690     matrix.set_value( row, col, w )
691   }
692   else
693     col += 1
694     matrix.set_value( row, col, token )
695   end
696   col
697 end
begin_block() click to toggle source

Operations required when beginnig of block encountered.


    # File lib/bio/db/nexus.rb
341 def begin_block() 
342   if @current_block_name != nil
343     raise NexusParseError, "Cannot have nested nexus blocks (\"end;\" might be missing)"
344   end
345   reset_command_state()
346 end
cmds_equal_to?( command, subcommand ) click to toggle source

Returns true if @current_cmd == command and @current_subcmd == subcommand, false otherwise


Arguments:

  • (required) command: String

  • (required) subcommand: String

Returns

true or false

    # File lib/bio/db/nexus.rb
736 def cmds_equal_to?( command, subcommand )
737   return ( @current_cmd == command && @current_subcmd == subcommand )
738 end
create_block() click to toggle source

Creates GenericBlock (or any of its subclasses) the type of which is determined by the state of @current_block_name.


Returns

GenericBlock (or any of its subclasses) object

    # File lib/bio/db/nexus.rb
395 def create_block()
396   case @current_block_name
397     when TAXA_BLOCK.downcase
398       return Bio::Nexus::TaxaBlock.new( @current_block_name )
399     when CHARACTERS_BLOCK.downcase
400       return Bio::Nexus::CharactersBlock.new( @current_block_name )
401     when DATA_BLOCK.downcase
402       return Bio::Nexus::DataBlock.new( @current_block_name )
403     when DISTANCES_BLOCK.downcase
404       return Bio::Nexus::DistancesBlock.new( @current_block_name )
405     when TREES_BLOCK.downcase
406       return Bio::Nexus::TreesBlock.new( @current_block_name )
407     else
408       return Bio::Nexus::GenericBlock.new( @current_block_name )
409   end 
410 end
end_block() click to toggle source

Operations required when ending of block encountered.


    # File lib/bio/db/nexus.rb
351 def end_block()
352   if @current_block_name == nil
353     raise NexusParseError, "Cannot have two or more \"end;\" tokens in sequence"
354   end
355   @current_block_name = nil
356 end
equal?( str1, str2 ) click to toggle source

Returns true if Strings str1 and str2 are equal - ignoring case.


Arguments:

  • (required) str1: String

  • (required) str2: String

Returns

true or false

    # File lib/bio/db/nexus.rb
721 def equal?( str1, str2 )
722   if ( str1 == nil || str2 == nil )
723     return false
724   else
725     return ( str1.downcase == str2.downcase )
726   end
727 end
make_matrix( token, ary, size, scan_token = false ) click to toggle source

Makes a NexusMatrix out of token from token Array ary Used by process_token_for_X_block methods which contain data in a matrix form. Column 0 contains names. This will shift tokens from ary.


Arguments:

  • (required) token: String

  • (required) ary: Array

  • (required) size: Integer

  • (optional) scan_token: true or false

Returns

NexusMatrix

    # File lib/bio/db/nexus.rb
647 def make_matrix( token, ary, size, scan_token = false )
648   matrix = NexusMatrix.new
649   col = -1
650   row = 0
651   done = false
652   while ( !done )
653     if ( col == -1 )
654       # name
655       col = 0
656       matrix.set_value( row, col, token ) # name is in col 0
657     else
658       # values
659       col = add_token_to_matrix( token, scan_token, matrix, row, col )
660       if ( col == size.to_i  )
661         col = -1
662         row += 1
663       end
664     end
665     token = ary.shift
666     if ( token.index( DELIMITER ) != nil )
667       col = add_token_to_matrix( token.chomp( ";" ), scan_token, matrix, row, col )
668       done = true
669     end
670   end # while
671   matrix
672 end
parse( str ) click to toggle source

The master method for parsing. Stores the resulting block in array @blocks.


Arguments:

  • (required) str: String - the String to be parsed

    # File lib/bio/db/nexus.rb
297 def parse( str )
298   str = str.chop if str[-1..-1] == ';'
299   ary = str.split(/[\s+=]/)
300   ary.collect! { |x| x.strip!; x.empty? ? nil : x }
301   ary.compact!
302   #in_comment = false
303   comment_level = 0
304  
305   # Main loop
306   while token = ary.shift
307     # Quotes:
308     if ( token.index( SINGLE_QUOTE ) == 0 ||
309          token.index( DOUBLE_QUOTE ) == 0 )
310       token << "_" << ary.shift
311       token = token.chop if token[-1..-1] == ';'
312       token = token.slice( 1, token.length - 2 )
313     end
314     # Comments:
315     open = token.count( BEGIN_COMMENT )
316     close = token.count( END_COMMENT )
317     comment = comment_level > 0
318     comment_level = comment_level + open - close
319     if ( open > 0 && open == close  )
320       next
321     elsif comment_level > 0 || comment
322       next
323     elsif equal?( token, END_BLOCK )
324       end_block()
325     elsif equal?( token, BEGIN_BLOCK )
326       begin_block()
327       @current_block_name = token = ary.shift
328       @current_block_name.downcase!
329       @current_block = create_block()
330       @blocks.push( @current_block )
331     elsif ( @current_block_name != nil )  
332       process_token( token.chomp( DELIMITER ), ary )
333     end
334   end # main loop
335   @blocks.compact!
336 end
process_token( token, ary ) click to toggle source

This calls various process_token_for_<name>_block methods depeding on state of @current_block_name.


Arguments:

  • (required) token: String

  • (required) ary: Array

    # File lib/bio/db/nexus.rb
365 def process_token( token, ary )
366   case @current_block_name
367     when TAXA_BLOCK.downcase
368       process_token_for_taxa_block( token )
369     when CHARACTERS_BLOCK.downcase
370       process_token_for_character_block( token, ary )
371     when DATA_BLOCK.downcase
372       process_token_for_data_block( token, ary )
373     when DISTANCES_BLOCK.downcase
374       process_token_for_distances_block( token, ary )
375     when TREES_BLOCK.downcase  
376       process_token_for_trees_block( token, ary )
377     else
378       process_token_for_generic_block( token )  
379   end
380 end
process_token_for_character_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a character block Example of a currently parseable character block: Begin Characters; Dimensions NChar=20

NTax=4;

Format DataType=DNA Missing=x Gap=- MatchChar=.; Matrix fish ACATA GAGGG TACCT CTAAG frog ACTTA GAGGC TACCT CTAGC snake ACTCA CTGGG TACCT TTGCG mouse ACTCA GACGG TACCT TTGCG; End;


Arguments:

  • (required) token: String

  • (required) ary: Array

    # File lib/bio/db/nexus.rb
458 def process_token_for_character_block( token, ary )
459   if ( equal?( token, DIMENSIONS ) )
460     @current_cmd    = DIMENSIONS
461     @current_subcmd = nil
462   elsif ( equal?( token, FORMAT ) )
463     @current_cmd    = FORMAT
464     @current_subcmd = nil  
465   elsif ( equal?( token, MATRIX ) )
466     @current_cmd    = MATRIX
467     @current_subcmd = nil
468   elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
469     @current_subcmd = NTAX
470   elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
471     @current_subcmd = NCHAR
472   elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
473     @current_subcmd = DATATYPE
474   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) )
475     @current_subcmd = CharactersBlock::MISSING 
476   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) )
477     @current_subcmd = CharactersBlock::GAP
478   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) )
479     @current_subcmd = CharactersBlock::MATCHCHAR  
480   elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
481     @current_block.set_number_of_taxa( token )
482   elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
483     @current_block.set_number_of_characters( token )  
484   elsif ( cmds_equal_to?( FORMAT, DATATYPE ) )
485     @current_block.set_datatype( token )
486   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) )
487     @current_block.set_missing( token )
488   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) )
489     @current_block.set_gap_character( token )
490   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) )
491     @current_block.set_match_character( token )  
492   elsif ( cmds_equal_to?( MATRIX, nil ) )
493     @current_block.set_matrix( make_matrix( token, ary,
494                                @current_block.get_number_of_characters, true ) )
495   end
496 end
process_token_for_data_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a data block. Example of a currently parseable data block: Begin Data; Dimensions ntax=5 nchar=14; Format Datatype=RNA gap=# MISSING=x MatchChar=^; TaxLabels ciona cow [comment] ape 'purple urchin' “green lizard”; Matrix taxon_1 A- CCGTCGA-GTTA taxon_2 T- CCG-CGA-GATA taxon_3 A- C-GTCGA-GATA taxon_4 A- CCTCGA–GTTA taxon_5 T- CGGTCGT-CTTA; End;


Arguments:

  • (required) token: String

  • (required) ary: Array

    # File lib/bio/db/nexus.rb
591 def process_token_for_data_block( token, ary )
592   if ( equal?( token, DIMENSIONS ) )
593     @current_cmd    = DIMENSIONS
594     @current_subcmd = nil
595   elsif ( equal?( token, FORMAT ) )
596     @current_cmd    = FORMAT
597     @current_subcmd = nil
598   elsif ( equal?( token, TAXLABELS ) )
599     @current_cmd    = TAXLABELS
600     @current_subcmd = nil  
601   elsif ( equal?( token, MATRIX ) )
602     @current_cmd    = MATRIX
603     @current_subcmd = nil
604   elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
605     @current_subcmd = NTAX
606   elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
607     @current_subcmd = NCHAR
608   elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
609     @current_subcmd = DATATYPE
610   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) )
611     @current_subcmd = CharactersBlock::MISSING 
612   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) )
613     @current_subcmd = CharactersBlock::GAP
614   elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) )
615     @current_subcmd = CharactersBlock::MATCHCHAR  
616   elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
617     @current_block.set_number_of_taxa( token )
618   elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
619     @current_block.set_number_of_characters( token )  
620   elsif ( cmds_equal_to?( FORMAT, DATATYPE ) )
621     @current_block.set_datatype( token )
622   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) )
623     @current_block.set_missing( token )
624   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) )
625     @current_block.set_gap_character( token )
626   elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) )
627     @current_block.set_match_character( token )
628   elsif ( cmds_equal_to?( TAXLABELS, nil ) )
629     @current_block.add_taxon( token ) 
630   elsif ( cmds_equal_to?( MATRIX, nil ) )
631     @current_block.set_matrix( make_matrix( token, ary,
632                                @current_block.get_number_of_characters, true ) )
633   end
634 end
process_token_for_distances_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a character block. Example of a currently parseable character block: Begin Distances;

Dimensions nchar=20 ntax=5;
Format Triangle=Upper;
Matrix
taxon_1 0.0 1.0 2.0 4.0 7.0
taxon_2 1.0 0.0 3.0 5.0 8.0
taxon_3 3.0 4.0 0.0 6.0 9.0
taxon_4 7.0 3.0 1.0 0.0 9.5
taxon_5 1.2 1.3 1.4 1.5 0.0;

End;


Arguments:

  • (required) token: String

  • (required) ary: Array

    # File lib/bio/db/nexus.rb
542 def process_token_for_distances_block( token, ary )
543   if ( equal?( token, DIMENSIONS ) )
544     @current_cmd    = DIMENSIONS
545     @current_subcmd = nil
546   elsif ( equal?( token, FORMAT ) )
547     @current_cmd    = FORMAT
548     @current_subcmd = nil  
549   elsif ( equal?( token, MATRIX ) )
550     @current_cmd    = MATRIX
551     @current_subcmd = nil
552   elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
553     @current_subcmd = NTAX
554   elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
555     @current_subcmd = NCHAR
556   elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
557     @current_subcmd = DATATYPE
558   elsif ( @current_cmd == FORMAT && equal?( token, DistancesBlock::TRIANGLE ) )
559     @current_subcmd = DistancesBlock::TRIANGLE   
560   elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
561     @current_block.set_number_of_taxa( token )
562   elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
563     @current_block.set_number_of_characters( token )  
564   elsif ( cmds_equal_to?( FORMAT, DistancesBlock::TRIANGLE ) )
565     @current_block.set_triangle( token )
566   elsif ( cmds_equal_to?( MATRIX, nil ) )
567     @current_block.set_matrix( make_matrix( token, ary,
568                                @current_block.get_number_of_taxa, false ) )
569   end
570 end
process_token_for_generic_block( token ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a block for which a specific parser is not available. Example of a currently parseable generic block: Begin Taxa;

token1 token2 token3 ...

End;


Arguments:

  • (required) token: String

    # File lib/bio/db/nexus.rb
709 def process_token_for_generic_block( token )
710     @current_block.add_token( token )
711 end
process_token_for_taxa_block( token ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a taxa block Example of a currently parseable taxa block: Begin Taxa;

Dimensions NTax=4;
TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse';

End;


Arguments:

  • (required) token: String

    # File lib/bio/db/nexus.rb
422 def process_token_for_taxa_block( token )
423   if ( equal?( token, DIMENSIONS ) )
424     @current_cmd    = DIMENSIONS
425     @current_subcmd = nil
426   elsif ( equal?( token, TAXLABELS ) )
427     @current_cmd    = TAXLABELS
428     @current_subcmd = nil
429   elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
430     @current_subcmd = NTAX
431   elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
432     @current_block.set_number_of_taxa( token )
433   elsif ( cmds_equal_to?( TAXLABELS, nil ) )
434     @current_block.add_taxon( token )
435   end
436 end
process_token_for_trees_block( token, ary ) click to toggle source

This processes the tokens (between Begin Trees; and End;) for a trees block Example of a currently parseable taxa block: Begin Trees; Tree best=(fish,(frog,(snake, mouse))); Tree other=(snake,(frog,( fish, mouse))); End;


Arguments:

  • (required) token: String

  • (required) ary: Array

    # File lib/bio/db/nexus.rb
509 def process_token_for_trees_block( token, ary )
510   if ( equal?( token, TreesBlock::TREE ) )
511     @current_cmd    = TreesBlock::TREE
512     @current_subcmd = nil
513   elsif ( cmds_equal_to?( TreesBlock::TREE, nil ) )
514     @current_block.add_tree_name( token )
515     tree_string = ary.shift
516     while ( tree_string.index( ";" ) == nil )
517       tree_string << ary.shift
518     end
519     @current_block.add_tree( tree_string )
520     @current_cmd    = nil
521   end  
522 end
reset_command_state() click to toggle source

Resets @current_cmd and @current_subcmd to nil.


    # File lib/bio/db/nexus.rb
385 def reset_command_state()
386   @current_cmd    = nil
387   @current_subcmd = nil
388 end