class Bio::Nexus
DESCRIPTION¶ ↑
Bio::Nexus
is a parser for nexus formatted data. It contains classes and constants enabling the representation and processing of nexus data.
USAGE¶ ↑
# Parsing a nexus formatted string str: nexus = Bio::Nexus.new( nexus_str ) # Obtaining of the nexus blocks as array of GenericBlock or # any of its subclasses (such as DistancesBlock): blocks = nexus.get_blocks # Getting a block by name: my_blocks = nexus.get_blocks_by_name( "my_block" ) # Getting distance blocks: distances_blocks = nexus.get_distances_blocks # Getting trees blocks: trees_blocks = nexus.get_trees_blocks # Getting data blocks: data_blocks = nexus.get_data_blocks # Getting characters blocks: character_blocks = nexus.get_characters_blocks # Getting taxa blocks: taxa_blocks = nexus.get_taxa_blocks
Constants
- BEGIN_BLOCK
- BEGIN_COMMENT
- BEGIN_NEXUS
- CHARACTERS
- CHARACTERS_BLOCK
- DATA
- DATATYPE
- DATA_BLOCK
- DELIMITER
- DIMENSIONS
- DISTANCES
- DISTANCES_BLOCK
- DOUBLE_QUOTE
- END_BLOCK
- END_COMMENT
- END_OF_LINE
- FORMAT
- INDENTENTION
- MATRIX
- NCHAR
- NTAX
- SINGLE_QUOTE
- TAXA
- TAXA_BLOCK
- TAXLABELS
- TREES
- TREES_BLOCK
Public Class Methods
Creates a new nexus parser for 'nexus_str'.
Arguments:
-
(required) nexus_str: String - nexus formatted data
# File lib/bio/db/nexus.rb 177 def initialize( nexus_str ) 178 @blocks = Array.new 179 @current_cmd = nil 180 @current_subcmd = nil 181 @current_block_name = nil 182 @current_block = nil 183 parse( nexus_str ) 184 end
Public Instance Methods
Returns an Array of all blocks found in the String 'nexus_str' set via Bio::Nexus.new
( nexus_str ).
- Returns
-
Array of GenericBlocks or any of its subclasses
# File lib/bio/db/nexus.rb 192 def get_blocks 193 @blocks 194 end
A convenience methods which returns an array of all nexus blocks for which the name equals 'name' found in the String 'nexus_str' set via Bio::Nexus.new
( nexus_str ).
Arguments:
-
(required) name: String
- Returns
-
Array of GenericBlocks or any of its subclasses
# File lib/bio/db/nexus.rb 204 def get_blocks_by_name( name ) 205 found_blocks = Array.new 206 @blocks.each do | block | 207 if ( name == block.get_name ) 208 found_blocks.push( block ) 209 end 210 end 211 found_blocks 212 end
A convenience methods which returns an array of all characters blocks.
- Returns
-
Array of CharactersBlocks
# File lib/bio/db/nexus.rb 228 def get_characters_blocks 229 get_blocks_by_name( CHARACTERS_BLOCK.chomp( ";").downcase ) 230 end
A convenience methods which returns an array of all data blocks.
- Returns
-
Array of DataBlocks
# File lib/bio/db/nexus.rb 219 def get_data_blocks 220 get_blocks_by_name( DATA_BLOCK.chomp( ";").downcase ) 221 end
A convenience methods which returns an array of all distances blocks.
- Returns
-
Array of
DistancesBlock
# File lib/bio/db/nexus.rb 246 def get_distances_blocks 247 get_blocks_by_name( DISTANCES_BLOCK.chomp( ";").downcase ) 248 end
A convenience methods which returns an array of all taxa blocks.
- Returns
-
Array of TaxaBlocks
# File lib/bio/db/nexus.rb 255 def get_taxa_blocks 256 get_blocks_by_name( TAXA_BLOCK.chomp( ";").downcase ) 257 end
A convenience methods which returns an array of all trees blocks.
- Returns
-
Array of TreesBlocks
# File lib/bio/db/nexus.rb 237 def get_trees_blocks 238 get_blocks_by_name( TREES_BLOCK.chomp( ";").downcase ) 239 end
Returns a String listing how many of each blocks it parsed.
- Returns
-
String
# File lib/bio/db/nexus.rb 263 def to_s 264 str = String.new 265 if get_blocks.length < 1 266 str << "empty" 267 else 268 str << "number of blocks: " << get_blocks.length.to_s 269 if get_characters_blocks.length > 0 270 str << " [characters blocks: " << get_characters_blocks.length.to_s << "] " 271 end 272 if get_data_blocks.length > 0 273 str << " [data blocks: " << get_data_blocks.length.to_s << "] " 274 end 275 if get_distances_blocks.length > 0 276 str << " [distances blocks: " << get_distances_blocks.length.to_s << "] " 277 end 278 if get_taxa_blocks.length > 0 279 str << " [taxa blocks: " << get_taxa_blocks.length.to_s << "] " 280 end 281 if get_trees_blocks.length > 0 282 str << " [trees blocks: " << get_trees_blocks.length.to_s << "] " 283 end 284 end 285 str 286 end
Private Instance Methods
Helper method for make_matrix.
Arguments:
-
(required) token: String
-
(required) scan_token: true or false - add whole token
or scan into chars
-
(required) matrix:
NexusMatrix
- the matrix to which to add token -
(required) row: Integer - the row for matrix
-
(required) col: Integer - the starting row
- Returns
-
Integer - ending row
# File lib/bio/db/nexus.rb 686 def add_token_to_matrix( token, scan_token, matrix, row, col ) 687 if ( scan_token ) 688 token.scan(/./) { |w| 689 col += 1 690 matrix.set_value( row, col, w ) 691 } 692 else 693 col += 1 694 matrix.set_value( row, col, token ) 695 end 696 col 697 end
Operations required when beginnig of block encountered.
# File lib/bio/db/nexus.rb 341 def begin_block() 342 if @current_block_name != nil 343 raise NexusParseError, "Cannot have nested nexus blocks (\"end;\" might be missing)" 344 end 345 reset_command_state() 346 end
Returns true if @current_cmd == command and @current_subcmd == subcommand, false otherwise
Arguments:
-
(required) command: String
-
(required) subcommand: String
- Returns
-
true or false
# File lib/bio/db/nexus.rb 736 def cmds_equal_to?( command, subcommand ) 737 return ( @current_cmd == command && @current_subcmd == subcommand ) 738 end
Creates GenericBlock
(or any of its subclasses) the type of which is determined by the state of @current_block_name.
- Returns
-
GenericBlock
(or any of its subclasses) object
# File lib/bio/db/nexus.rb 395 def create_block() 396 case @current_block_name 397 when TAXA_BLOCK.downcase 398 return Bio::Nexus::TaxaBlock.new( @current_block_name ) 399 when CHARACTERS_BLOCK.downcase 400 return Bio::Nexus::CharactersBlock.new( @current_block_name ) 401 when DATA_BLOCK.downcase 402 return Bio::Nexus::DataBlock.new( @current_block_name ) 403 when DISTANCES_BLOCK.downcase 404 return Bio::Nexus::DistancesBlock.new( @current_block_name ) 405 when TREES_BLOCK.downcase 406 return Bio::Nexus::TreesBlock.new( @current_block_name ) 407 else 408 return Bio::Nexus::GenericBlock.new( @current_block_name ) 409 end 410 end
Operations required when ending of block encountered.
# File lib/bio/db/nexus.rb 351 def end_block() 352 if @current_block_name == nil 353 raise NexusParseError, "Cannot have two or more \"end;\" tokens in sequence" 354 end 355 @current_block_name = nil 356 end
Returns true if Strings str1 and str2 are equal - ignoring case.
Arguments:
-
(required) str1: String
-
(required) str2: String
- Returns
-
true or false
# File lib/bio/db/nexus.rb 721 def equal?( str1, str2 ) 722 if ( str1 == nil || str2 == nil ) 723 return false 724 else 725 return ( str1.downcase == str2.downcase ) 726 end 727 end
Makes a NexusMatrix
out of token from token Array ary Used by process_token_for_X_block methods which contain data in a matrix form. Column 0 contains names. This will shift tokens from ary.
Arguments:
-
(required) token: String
-
(required) ary: Array
-
(required) size: Integer
-
(optional) scan_token: true or false
- Returns
# File lib/bio/db/nexus.rb 647 def make_matrix( token, ary, size, scan_token = false ) 648 matrix = NexusMatrix.new 649 col = -1 650 row = 0 651 done = false 652 while ( !done ) 653 if ( col == -1 ) 654 # name 655 col = 0 656 matrix.set_value( row, col, token ) # name is in col 0 657 else 658 # values 659 col = add_token_to_matrix( token, scan_token, matrix, row, col ) 660 if ( col == size.to_i ) 661 col = -1 662 row += 1 663 end 664 end 665 token = ary.shift 666 if ( token.index( DELIMITER ) != nil ) 667 col = add_token_to_matrix( token.chomp( ";" ), scan_token, matrix, row, col ) 668 done = true 669 end 670 end # while 671 matrix 672 end
The master method for parsing. Stores the resulting block in array @blocks.
Arguments:
-
(required) str: String - the String to be parsed
# File lib/bio/db/nexus.rb 297 def parse( str ) 298 str = str.chop if str[-1..-1] == ';' 299 ary = str.split(/[\s+=]/) 300 ary.collect! { |x| x.strip!; x.empty? ? nil : x } 301 ary.compact! 302 #in_comment = false 303 comment_level = 0 304 305 # Main loop 306 while token = ary.shift 307 # Quotes: 308 if ( token.index( SINGLE_QUOTE ) == 0 || 309 token.index( DOUBLE_QUOTE ) == 0 ) 310 token << "_" << ary.shift 311 token = token.chop if token[-1..-1] == ';' 312 token = token.slice( 1, token.length - 2 ) 313 end 314 # Comments: 315 open = token.count( BEGIN_COMMENT ) 316 close = token.count( END_COMMENT ) 317 comment = comment_level > 0 318 comment_level = comment_level + open - close 319 if ( open > 0 && open == close ) 320 next 321 elsif comment_level > 0 || comment 322 next 323 elsif equal?( token, END_BLOCK ) 324 end_block() 325 elsif equal?( token, BEGIN_BLOCK ) 326 begin_block() 327 @current_block_name = token = ary.shift 328 @current_block_name.downcase! 329 @current_block = create_block() 330 @blocks.push( @current_block ) 331 elsif ( @current_block_name != nil ) 332 process_token( token.chomp( DELIMITER ), ary ) 333 end 334 end # main loop 335 @blocks.compact! 336 end
This calls various process_token_for_<name>_block methods depeding on state of @current_block_name.
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb 365 def process_token( token, ary ) 366 case @current_block_name 367 when TAXA_BLOCK.downcase 368 process_token_for_taxa_block( token ) 369 when CHARACTERS_BLOCK.downcase 370 process_token_for_character_block( token, ary ) 371 when DATA_BLOCK.downcase 372 process_token_for_data_block( token, ary ) 373 when DISTANCES_BLOCK.downcase 374 process_token_for_distances_block( token, ary ) 375 when TREES_BLOCK.downcase 376 process_token_for_trees_block( token, ary ) 377 else 378 process_token_for_generic_block( token ) 379 end 380 end
This processes the tokens (between Begin Taxa; and End;) for a character block Example of a currently parseable character block: Begin Characters; Dimensions NChar=20
NTax=4;
Format DataType=DNA Missing=x Gap=- MatchChar=.; Matrix fish ACATA GAGGG TACCT CTAAG frog ACTTA GAGGC TACCT CTAGC snake ACTCA CTGGG TACCT TTGCG mouse ACTCA GACGG TACCT TTGCG; End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb 458 def process_token_for_character_block( token, ary ) 459 if ( equal?( token, DIMENSIONS ) ) 460 @current_cmd = DIMENSIONS 461 @current_subcmd = nil 462 elsif ( equal?( token, FORMAT ) ) 463 @current_cmd = FORMAT 464 @current_subcmd = nil 465 elsif ( equal?( token, MATRIX ) ) 466 @current_cmd = MATRIX 467 @current_subcmd = nil 468 elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) 469 @current_subcmd = NTAX 470 elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) 471 @current_subcmd = NCHAR 472 elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) 473 @current_subcmd = DATATYPE 474 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) 475 @current_subcmd = CharactersBlock::MISSING 476 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) 477 @current_subcmd = CharactersBlock::GAP 478 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) 479 @current_subcmd = CharactersBlock::MATCHCHAR 480 elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) 481 @current_block.set_number_of_taxa( token ) 482 elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) 483 @current_block.set_number_of_characters( token ) 484 elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) 485 @current_block.set_datatype( token ) 486 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) 487 @current_block.set_missing( token ) 488 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) 489 @current_block.set_gap_character( token ) 490 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) 491 @current_block.set_match_character( token ) 492 elsif ( cmds_equal_to?( MATRIX, nil ) ) 493 @current_block.set_matrix( make_matrix( token, ary, 494 @current_block.get_number_of_characters, true ) ) 495 end 496 end
This processes the tokens (between Begin Taxa; and End;) for a data block. Example of a currently parseable data block: Begin Data; Dimensions ntax=5 nchar=14; Format Datatype=RNA gap=# MISSING=x MatchChar=^; TaxLabels ciona cow [comment] ape 'purple urchin' “green lizard”; Matrix taxon_1 A- CCGTCGA-GTTA taxon_2 T- CCG-CGA-GATA taxon_3 A- C-GTCGA-GATA taxon_4 A- CCTCGA–GTTA taxon_5 T- CGGTCGT-CTTA; End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb 591 def process_token_for_data_block( token, ary ) 592 if ( equal?( token, DIMENSIONS ) ) 593 @current_cmd = DIMENSIONS 594 @current_subcmd = nil 595 elsif ( equal?( token, FORMAT ) ) 596 @current_cmd = FORMAT 597 @current_subcmd = nil 598 elsif ( equal?( token, TAXLABELS ) ) 599 @current_cmd = TAXLABELS 600 @current_subcmd = nil 601 elsif ( equal?( token, MATRIX ) ) 602 @current_cmd = MATRIX 603 @current_subcmd = nil 604 elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) 605 @current_subcmd = NTAX 606 elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) 607 @current_subcmd = NCHAR 608 elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) 609 @current_subcmd = DATATYPE 610 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) 611 @current_subcmd = CharactersBlock::MISSING 612 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) 613 @current_subcmd = CharactersBlock::GAP 614 elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) 615 @current_subcmd = CharactersBlock::MATCHCHAR 616 elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) 617 @current_block.set_number_of_taxa( token ) 618 elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) 619 @current_block.set_number_of_characters( token ) 620 elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) 621 @current_block.set_datatype( token ) 622 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) 623 @current_block.set_missing( token ) 624 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) 625 @current_block.set_gap_character( token ) 626 elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) 627 @current_block.set_match_character( token ) 628 elsif ( cmds_equal_to?( TAXLABELS, nil ) ) 629 @current_block.add_taxon( token ) 630 elsif ( cmds_equal_to?( MATRIX, nil ) ) 631 @current_block.set_matrix( make_matrix( token, ary, 632 @current_block.get_number_of_characters, true ) ) 633 end 634 end
This processes the tokens (between Begin Taxa; and End;) for a character block. Example of a currently parseable character block: Begin Distances;
Dimensions nchar=20 ntax=5; Format Triangle=Upper; Matrix taxon_1 0.0 1.0 2.0 4.0 7.0 taxon_2 1.0 0.0 3.0 5.0 8.0 taxon_3 3.0 4.0 0.0 6.0 9.0 taxon_4 7.0 3.0 1.0 0.0 9.5 taxon_5 1.2 1.3 1.4 1.5 0.0;
End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb 542 def process_token_for_distances_block( token, ary ) 543 if ( equal?( token, DIMENSIONS ) ) 544 @current_cmd = DIMENSIONS 545 @current_subcmd = nil 546 elsif ( equal?( token, FORMAT ) ) 547 @current_cmd = FORMAT 548 @current_subcmd = nil 549 elsif ( equal?( token, MATRIX ) ) 550 @current_cmd = MATRIX 551 @current_subcmd = nil 552 elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) 553 @current_subcmd = NTAX 554 elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) 555 @current_subcmd = NCHAR 556 elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) 557 @current_subcmd = DATATYPE 558 elsif ( @current_cmd == FORMAT && equal?( token, DistancesBlock::TRIANGLE ) ) 559 @current_subcmd = DistancesBlock::TRIANGLE 560 elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) 561 @current_block.set_number_of_taxa( token ) 562 elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) 563 @current_block.set_number_of_characters( token ) 564 elsif ( cmds_equal_to?( FORMAT, DistancesBlock::TRIANGLE ) ) 565 @current_block.set_triangle( token ) 566 elsif ( cmds_equal_to?( MATRIX, nil ) ) 567 @current_block.set_matrix( make_matrix( token, ary, 568 @current_block.get_number_of_taxa, false ) ) 569 end 570 end
This processes the tokens (between Begin Taxa; and End;) for a block for which a specific parser is not available. Example of a currently parseable generic block: Begin Taxa;
token1 token2 token3 ...
End;
Arguments:
-
(required) token: String
# File lib/bio/db/nexus.rb 709 def process_token_for_generic_block( token ) 710 @current_block.add_token( token ) 711 end
This processes the tokens (between Begin Taxa; and End;) for a taxa block Example of a currently parseable taxa block: Begin Taxa;
Dimensions NTax=4; TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse';
End;
Arguments:
-
(required) token: String
# File lib/bio/db/nexus.rb 422 def process_token_for_taxa_block( token ) 423 if ( equal?( token, DIMENSIONS ) ) 424 @current_cmd = DIMENSIONS 425 @current_subcmd = nil 426 elsif ( equal?( token, TAXLABELS ) ) 427 @current_cmd = TAXLABELS 428 @current_subcmd = nil 429 elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) 430 @current_subcmd = NTAX 431 elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) 432 @current_block.set_number_of_taxa( token ) 433 elsif ( cmds_equal_to?( TAXLABELS, nil ) ) 434 @current_block.add_taxon( token ) 435 end 436 end
This processes the tokens (between Begin Trees; and End;) for a trees block Example of a currently parseable taxa block: Begin Trees; Tree
best=(fish,(frog,(snake, mouse))); Tree
other=(snake,(frog,( fish, mouse))); End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb 509 def process_token_for_trees_block( token, ary ) 510 if ( equal?( token, TreesBlock::TREE ) ) 511 @current_cmd = TreesBlock::TREE 512 @current_subcmd = nil 513 elsif ( cmds_equal_to?( TreesBlock::TREE, nil ) ) 514 @current_block.add_tree_name( token ) 515 tree_string = ary.shift 516 while ( tree_string.index( ";" ) == nil ) 517 tree_string << ary.shift 518 end 519 @current_block.add_tree( tree_string ) 520 @current_cmd = nil 521 end 522 end
Resets @current_cmd and @current_subcmd to nil.
# File lib/bio/db/nexus.rb 385 def reset_command_state() 386 @current_cmd = nil 387 @current_subcmd = nil 388 end