class Bio::Blast::Report
Bio::Blast::Report
¶ ↑
Parsed results of the blast execution for Tab-delimited and XML output format. Tab-delimited reports are consists of
Query id, Subject id, percent of identity, alignment length, number of mismatches (not including gaps), number of gap openings, start of alignment in query, end of alignment in query, start of alignment in subject, end of alignment in subject, expected value, bit score.
according to the MEGABLAST document (README.mbl). As for XML output, see the following DTDs.
* http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd * http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod * http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod
Constants
- DELIMITER
for
Bio::FlatFile
support (only for XML data)- FLATFILE_SPLITTER
Flatfile splitter for
NCBI
BLAST XML format. It is internally used when reading BLAST XML. Normally, users do not need to use it directly.
Attributes
database name or title (String)
Returns an Array of Bio::Blast::Report::Iteration
objects.
Returns a Hash containing execution parameters. Valid keys are: 'matrix', 'expect', 'include', 'sc-match', 'sc-mismatch', 'gap-open', 'gap-extend', 'filter'
program name (e.g. “blastp”) (String)
query definition line (String)
query ID (String)
query length (Integer)
reference (String)
When the report contains results for multiple query sequences, returns an array of Bio::Blast::Report
objects corresponding to the multiple queries. Otherwise, returns nil.
Note for “No hits found”: When no hits found for a query sequence, the result for the query is completely void and no information available in the result XML, including query ID and query definition. The only trace is that iteration number is skipped. This means that if the no-hit query is the last query, the query can not be detected, because the result XML is completely the same as the result XML without the query.
BLAST version (e.g. “blastp 2.2.18 [Mar-02-2008]”) (String)
Public Class Methods
Passing a BLAST output from 'blastall -m 7' or '-m 8' as a String. Formats are auto detected.
# File lib/bio/appl/blast/report.rb 87 def initialize(data, parser = nil) 88 @iterations = [] 89 @parameters = {} 90 case parser 91 when :xmlparser # format 7 92 if defined? xmlparser_parse 93 xmlparser_parse(data) 94 else 95 raise NameError, "xmlparser_parse does not defined" 96 end 97 @reports = blastxml_split_reports 98 when :rexml # format 7 99 rexml_parse(data) 100 @reports = blastxml_split_reports 101 when :tab # format 8 102 tab_parse(data) 103 when false 104 # do not parse, creates an empty object 105 else 106 auto_parse(data) 107 end 108 end
Specify to use REXML to parse XML (-m 7) output.
# File lib/bio/appl/blast/report.rb 61 def self.rexml(data) 62 self.new(data, :rexml) 63 end
Specify to use tab delimited output parser.
# File lib/bio/appl/blast/report.rb 66 def self.tab(data) 67 self.new(data, :tab) 68 end
Public Instance Methods
Length of BLAST db
# File lib/bio/appl/blast/report.rb 197 def db_len; statistics['db-len']; end
Number of sequences in BLAST db
# File lib/bio/appl/blast/report.rb 195 def db_num; statistics['db-num']; end
Iterates on each Bio::Blast::Report::Hit
object of the the last Iteration
. Shortcut for the last iteration's hits (for blastall)
# File lib/bio/appl/blast/report.rb 173 def each_hit 174 @iterations.last.each do |x| 175 yield x 176 end 177 end
Iterates on each Bio::Blast::Report::Iteration
object. (for blastpgp)
# File lib/bio/appl/blast/report.rb 165 def each_iteration 166 @iterations.each do |x| 167 yield x 168 end 169 end
Effective search space
# File lib/bio/appl/blast/report.rb 201 def eff_space; statistics['eff-space']; end
Limit of request to Entrez : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 162 def entrez_query; @parameters['entrez-query']; end
Karlin-Altschul parameter H
# File lib/bio/appl/blast/report.rb 207 def entropy; statistics['entropy']; end
Expectation threshold (-e) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 146 def expect; @parameters['expect']; end
Filtering options (-F) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 158 def filter; @parameters['filter']; end
Gap extension cost (-E) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 156 def gap_extend; @parameters['gap-extend']; end
Gap opening cost (-G) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 154 def gap_open; @parameters['gap-open']; end
Returns a Array of Bio::Blast::Report::Hits of the last iteration. Shortcut for the last iteration's hits
# File lib/bio/appl/blast/report.rb 182 def hits 183 @iterations.last.hits 184 end
Effective HSP length
# File lib/bio/appl/blast/report.rb 199 def hsp_len; statistics['hsp-len']; end
Inclusion threshold (-h) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 148 def inclusion; @parameters['include']; end
Karlin-Altschul parameter K
# File lib/bio/appl/blast/report.rb 203 def kappa; statistics['kappa']; end
Karlin-Altschul parameter Lamba
# File lib/bio/appl/blast/report.rb 205 def lambda; statistics['lambda']; end
Matrix used (-M) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 144 def matrix; @parameters['matrix']; end
Returns a String (or nil) containing execution message of the last iteration (typically “CONVERGED”). Shortcut for the last iteration's message (for checking 'CONVERGED')
# File lib/bio/appl/blast/report.rb 212 def message 213 @iterations.last.message 214 end
PHI-BLAST pattern : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 160 def pattern; @parameters['pattern']; end
Match score for NT (-r) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 150 def sc_match; @parameters['sc-match']; end
Mismatch score for NT (-q) : shortcuts for @parameters
# File lib/bio/appl/blast/report.rb 152 def sc_mismatch; @parameters['sc-mismatch']; end
Returns a Hash containing execution statistics of the last iteration. Valid keys are: 'db-num', 'db-len', 'hsp-len', 'eff-space', 'kappa', 'lambda', 'entropy' Shortcut for the last iteration's statistics.
# File lib/bio/appl/blast/report.rb 190 def statistics 191 @iterations.last.statistics 192 end
Private Instance Methods
# File lib/bio/appl/blast/report.rb 70 def auto_parse(data) 71 if /<?xml/.match(data[/.*/]) 72 if defined? xmlparser_parse 73 xmlparser_parse(data) 74 @reports = blastxml_split_reports 75 else 76 rexml_parse(data) 77 @reports = blastxml_split_reports 78 end 79 else 80 tab_parse(data) 81 end 82 end
(private method) In new BLAST XML (blastall >= 2.2.14), results of multiple queries are stored in <Iteration>. This method splits iterations into multiple Bio::Blast
objects and returns them as an array.
# File lib/bio/appl/blast/report.rb 422 def blastxml_split_reports 423 unless self.iterations.find { |iter| 424 iter.query_id || iter.query_def || iter.query_len 425 } then 426 # traditional BLAST XML format, or blastpgp result. 427 return nil 428 end 429 430 # new BLAST XML format (blastall 2.2.14 or later) 431 origin = self 432 reports = [] 433 prev_iternum = 0 434 firsttime = true 435 436 orig_iters = self.iterations 437 orig_iters.each do |iter| 438 blast = self.class.new(nil, false) 439 # When no hits found, the iteration is skipped in NCBI BLAST XML. 440 # So, filled with empty report object. 441 if prev_iternum + 1 < iter.num then 442 ((prev_iternum + 1)...(iter.num)).each do |num| 443 empty_i = Iteration.new 444 empty_i.num = num 445 empty_i.instance_eval { 446 if firsttime then 447 @query_id = origin.query_id 448 @query_def = origin.query_def 449 @query_len = origin.query_len 450 firsttime = false 451 end 452 } 453 empty = self.class.new(nil, false) 454 empty.instance_eval { 455 # queriy_* are copied from the empty_i 456 @query_id = empty_i.query_id 457 @query_def = empty_i.query_def 458 @query_len = empty_i.query_len 459 # others are copied from the origin 460 @program = origin.program 461 @version = origin.version 462 @reference = origin.reference 463 @db = origin.db 464 @parameters.update(origin.parameters) 465 # the empty_i is added to the iterations 466 @iterations.push empty_i 467 } 468 reports.push empty 469 end 470 end 471 472 blast.instance_eval { 473 if firsttime then 474 @query_id = origin.query_id 475 @query_def = origin.query_def 476 @query_len = origin.query_len 477 firsttime = false 478 end 479 # queriy_* are copied from the iter 480 @query_id = iter.query_id if iter.query_id 481 @query_def = iter.query_def if iter.query_def 482 @query_len = iter.query_len if iter.query_len 483 # others are copied from the origin 484 @program = origin.program 485 @version = origin.version 486 @reference = origin.reference 487 @db = origin.db 488 @parameters.update(origin.parameters) 489 # rewrites hit's query_id, query_def, query_len 490 iter.hits.each do |h| 491 h.query_id = @query_id 492 h.query_def = @query_def 493 h.query_len = @query_len 494 end 495 # the iter is added to the iterations 496 @iterations.push iter 497 } 498 499 prev_iternum = iter.num 500 reports.push blast 501 end #orig_iters.each 502 503 # This object's iterations is set as first report's iterations 504 @iterations.clear 505 if rep = reports.first then 506 @iterations = rep.iterations 507 end 508 509 return reports 510 end
# File lib/bio/appl/blast/rexml.rb 25 def rexml_parse(xml) 26 dom = REXML::Document.new(xml) 27 rexml_parse_program(dom) 28 dom.elements.each("*//Iteration") do |e| 29 @iterations.push(rexml_parse_iteration(e)) 30 end 31 end
# File lib/bio/appl/blast/rexml.rb 87 def rexml_parse_hit(e) 88 hit = Hit.new 89 hash = {} 90 hit.query_id = @query_id 91 hit.query_def = @query_def 92 hit.query_len = @query_len 93 e.elements.each do |h| 94 case h.name 95 when 'Hit_hsps' 96 h.elements.each("Hsp") do |s| 97 hit.hsps.push(rexml_parse_hsp(s)) 98 end 99 else 100 hash[h.name] = h.text 101 end 102 end 103 hit.num = hash['Hit_num'].to_i 104 hit.hit_id = hash['Hit_id'] 105 hit.len = hash['Hit_len'].to_i 106 hit.definition = hash['Hit_def'] 107 hit.accession = hash['Hit_accession'] 108 return hit 109 end
# File lib/bio/appl/blast/rexml.rb 111 def rexml_parse_hsp(e) 112 hsp = Hsp.new 113 hash = {} 114 e.each_element_with_text do |h| 115 hash[h.name] = h.text 116 end 117 hsp.num = hash['Hsp_num'].to_i 118 hsp.bit_score = hash['Hsp_bit-score'].to_f 119 hsp.score = hash['Hsp_score'].to_i 120 hsp.evalue = hash['Hsp_evalue'].to_f 121 hsp.query_from = hash['Hsp_query-from'].to_i 122 hsp.query_to = hash['Hsp_query-to'].to_i 123 hsp.hit_from = hash['Hsp_hit-from'].to_i 124 hsp.hit_to = hash['Hsp_hit-to'].to_i 125 hsp.pattern_from = hash['Hsp_pattern-from'].to_i 126 hsp.pattern_to = hash['Hsp_pattern-to'].to_i 127 hsp.query_frame = hash['Hsp_query-frame'].to_i 128 hsp.hit_frame = hash['Hsp_hit-frame'].to_i 129 hsp.identity = hash['Hsp_identity'].to_i 130 hsp.positive = hash['Hsp_positive'].to_i 131 hsp.gaps = hash['Hsp_gaps'].to_i 132 hsp.align_len = hash['Hsp_align-len'].to_i 133 hsp.density = hash['Hsp_density'].to_i 134 hsp.qseq = hash['Hsp_qseq'] 135 hsp.hseq = hash['Hsp_hseq'] 136 hsp.midline = hash['Hsp_midline'] 137 return hsp 138 end
# File lib/bio/appl/blast/rexml.rb 55 def rexml_parse_iteration(e) 56 iteration = Iteration.new 57 e.elements.each do |i| 58 case i.name 59 when 'Iteration_iter-num' 60 iteration.num = i.text.to_i 61 when 'Iteration_hits' 62 i.elements.each("Hit") do |h| 63 iteration.hits.push(rexml_parse_hit(h)) 64 end 65 when 'Iteration_message' 66 iteration.message = i.text 67 when 'Iteration_stat' 68 i.elements["Statistics"].each_element_with_text do |s| 69 k = s.name.sub(/Statistics_/, '') 70 v = s.text =~ /\D/ ? s.text.to_f : s.text.to_i 71 iteration.statistics[k] = v 72 end 73 74 # for new BLAST XML format 75 when 'Iteration_query-ID' 76 iteration.query_id = i.text 77 when 'Iteration_query-def' 78 iteration.query_def = i.text 79 when 'Iteration_query-len' 80 iteration.query_len = i.text.to_i 81 end 82 end #case i.name 83 84 return iteration 85 end
# File lib/bio/appl/blast/rexml.rb 33 def rexml_parse_program(dom) 34 hash = {} 35 dom.root.each_element_with_text do |e| 36 name, text = e.name, e.text 37 case name 38 when 'BlastOutput_param' 39 e.elements["Parameters"].each_element_with_text do |p| 40 xml_set_parameter(p.name, p.text) 41 end 42 else 43 hash[name] = text if text.strip.size > 0 44 end 45 end 46 @program = hash['BlastOutput_program'] 47 @version = hash['BlastOutput_version'] 48 @reference = hash['BlastOutput_reference'] 49 @db = hash['BlastOutput_db'] 50 @query_id = hash['BlastOutput_query-ID'] 51 @query_def = hash['BlastOutput_query-def'] 52 @query_len = hash['BlastOutput_query-len'].to_i 53 end
# File lib/bio/appl/blast/format8.rb 20 def tab_parse(data) 21 iteration = Iteration.new 22 @iterations.push(iteration) 23 @query_id = @query_def = data[/\S+/] 24 25 query_prev = '' 26 target_prev = '' 27 hit_num = 1 28 hsp_num = 1 29 hit = '' 30 data.each_line do |line| 31 ary = line.chomp.split("\t") 32 query_id, target_id, hsp = tab_parse_hsp(ary) 33 if query_prev != query_id or target_prev != target_id 34 hit = Hit.new 35 hit.num = hit_num 36 hit_num += 1 37 hit.query_id = hit.query_def = query_id 38 hit.accession = hit.definition = target_id 39 iteration.hits.push(hit) 40 hsp_num = 1 41 end 42 hsp.num = hsp_num 43 hsp_num += 1 44 hit.hsps.push(hsp) 45 query_prev = query_id 46 target_prev = target_id 47 end 48 end
# File lib/bio/appl/blast/format8.rb 50 def tab_parse_hsp(ary) 51 query_id, target_id, 52 percent_identity, 53 align_len, 54 mismatch_count, 55 gaps, 56 query_from, 57 query_to, 58 hit_from, 59 hit_to, 60 evalue, 61 bit_score = *ary 62 63 hsp = Hsp.new 64 hsp.align_len = align_len.to_i 65 hsp.gaps = gaps.to_i 66 hsp.query_from = query_from.to_i 67 hsp.query_to = query_to.to_i 68 hsp.hit_from = hit_from.to_i 69 hsp.hit_to = hit_to.to_i 70 hsp.evalue = evalue.strip.to_f 71 hsp.bit_score = bit_score.to_f 72 73 hsp.percent_identity = percent_identity.to_f 74 hsp.mismatch_count = mismatch_count.to_i 75 76 return query_id, target_id, hsp 77 end
set parameter of the key as val
# File lib/bio/appl/blast/report.rb 393 def xml_set_parameter(key, val) 394 #labels = { 395 # 'matrix' => 'Parameters_matrix', 396 # 'expect' => 'Parameters_expect', 397 # 'include' => 'Parameters_include', 398 # 'sc-match' => 'Parameters_sc-match', 399 # 'sc-mismatch' => 'Parameters_sc-mismatch', 400 # 'gap-open' => 'Parameters_gap-open', 401 # 'gap-extend' => 'Parameters_gap-extend', 402 # 'filter' => 'Parameters_filter', 403 # 'pattern' => 'Parameters_pattern', 404 # 'entrez-query' => 'Parameters_entrez-query', 405 #} 406 k = key.sub(/\AParameters\_/, '') 407 @parameters[k] = 408 case k 409 when 'expect', 'include' 410 val.to_f 411 when /\Agap\-/, /\Asc\-/ 412 val.to_i 413 else 414 val 415 end 416 end