class Bio::Blast::Report

Bio::Blast::Report

Parsed results of the blast execution for Tab-delimited and XML output format. Tab-delimited reports are consists of

Query id,
Subject id,
percent of identity,
alignment length,
number of mismatches (not including gaps),
number of gap openings,
start of alignment in query,
end of alignment in query,
start of alignment in subject,
end of alignment in subject,
expected value,
bit score.

according to the MEGABLAST document (README.mbl). As for XML output, see the following DTDs.

* http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd
* http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod
* http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod

Constants

DELIMITER

for Bio::FlatFile support (only for XML data)

FLATFILE_SPLITTER

Flatfile splitter for NCBI BLAST XML format. It is internally used when reading BLAST XML. Normally, users do not need to use it directly.

Attributes

db[R]

database name or title (String)

iterations[R]

Returns an Array of Bio::Blast::Report::Iteration objects.

parameters[R]

Returns a Hash containing execution parameters. Valid keys are: 'matrix', 'expect', 'include', 'sc-match', 'sc-mismatch', 'gap-open', 'gap-extend', 'filter'

program[R]

program name (e.g. “blastp”) (String)

query_def[R]

query definition line (String)

query_id[R]

query ID (String)

query_len[R]

query length (Integer)

reference[R]

reference (String)

reports[R]

When the report contains results for multiple query sequences, returns an array of Bio::Blast::Report objects corresponding to the multiple queries. Otherwise, returns nil.

Note for “No hits found”: When no hits found for a query sequence, the result for the query is completely void and no information available in the result XML, including query ID and query definition. The only trace is that iteration number is skipped. This means that if the no-hit query is the last query, the query can not be detected, because the result XML is completely the same as the result XML without the query.

version[R]

BLAST version (e.g. “blastp 2.2.18 [Mar-02-2008]”) (String)

Public Class Methods

new(data, parser = nil) click to toggle source

Passing a BLAST output from 'blastall -m 7' or '-m 8' as a String. Formats are auto detected.

    # File lib/bio/appl/blast/report.rb
 87 def initialize(data, parser = nil)
 88   @iterations = []
 89   @parameters = {}
 90   case parser
 91   when :xmlparser             # format 7
 92     if defined? xmlparser_parse
 93       xmlparser_parse(data)
 94     else
 95       raise NameError, "xmlparser_parse does not defined"
 96     end
 97     @reports = blastxml_split_reports
 98   when :rexml         # format 7
 99     rexml_parse(data)
100     @reports = blastxml_split_reports
101   when :tab           # format 8
102     tab_parse(data)
103   when false
104     # do not parse, creates an empty object
105   else
106     auto_parse(data)
107   end
108 end
rexml(data) click to toggle source

Specify to use REXML to parse XML (-m 7) output.

   # File lib/bio/appl/blast/report.rb
61 def self.rexml(data)
62   self.new(data, :rexml)
63 end
tab(data) click to toggle source

Specify to use tab delimited output parser.

   # File lib/bio/appl/blast/report.rb
66 def self.tab(data)
67   self.new(data, :tab)
68 end

Public Instance Methods

db_len() click to toggle source

Length of BLAST db

    # File lib/bio/appl/blast/report.rb
197 def db_len;    statistics['db-len'];    end
db_num() click to toggle source

Number of sequences in BLAST db

    # File lib/bio/appl/blast/report.rb
195 def db_num;    statistics['db-num'];    end
each()
Alias for: each_hit
each_hit() { |x| ... } click to toggle source

Iterates on each Bio::Blast::Report::Hit object of the the last Iteration. Shortcut for the last iteration's hits (for blastall)

    # File lib/bio/appl/blast/report.rb
173 def each_hit
174   @iterations.last.each do |x|
175     yield x
176   end
177 end
Also aliased as: each
each_iteration() { |x| ... } click to toggle source

Iterates on each Bio::Blast::Report::Iteration object. (for blastpgp)

    # File lib/bio/appl/blast/report.rb
165 def each_iteration
166   @iterations.each do |x|
167     yield x
168   end
169 end
eff_space() click to toggle source

Effective search space

    # File lib/bio/appl/blast/report.rb
201 def eff_space; statistics['eff-space']; end
entrez_query() click to toggle source

Limit of request to Entrez : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
162 def entrez_query; @parameters['entrez-query'];     end
entropy() click to toggle source

Karlin-Altschul parameter H

    # File lib/bio/appl/blast/report.rb
207 def entropy;   statistics['entropy'];   end
expect() click to toggle source

Expectation threshold (-e) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
146 def expect;       @parameters['expect'];           end
filter() click to toggle source

Filtering options (-F) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
158 def filter;       @parameters['filter'];           end
gap_extend() click to toggle source

Gap extension cost (-E) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
156 def gap_extend;   @parameters['gap-extend'];       end
gap_open() click to toggle source

Gap opening cost (-G) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
154 def gap_open;     @parameters['gap-open'];         end
hits() click to toggle source

Returns a Array of Bio::Blast::Report::Hits of the last iteration. Shortcut for the last iteration's hits

    # File lib/bio/appl/blast/report.rb
182 def hits
183   @iterations.last.hits
184 end
hsp_len() click to toggle source

Effective HSP length

    # File lib/bio/appl/blast/report.rb
199 def hsp_len;   statistics['hsp-len'];   end
inclusion() click to toggle source

Inclusion threshold (-h) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
148 def inclusion;    @parameters['include'];          end
kappa() click to toggle source

Karlin-Altschul parameter K

    # File lib/bio/appl/blast/report.rb
203 def kappa;     statistics['kappa'];     end
lambda() click to toggle source

Karlin-Altschul parameter Lamba

    # File lib/bio/appl/blast/report.rb
205 def lambda;    statistics['lambda'];    end
matrix() click to toggle source

Matrix used (-M) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
144 def matrix;       @parameters['matrix'];           end
message() click to toggle source

Returns a String (or nil) containing execution message of the last iteration (typically “CONVERGED”). Shortcut for the last iteration's message (for checking 'CONVERGED')

    # File lib/bio/appl/blast/report.rb
212 def message
213   @iterations.last.message
214 end
pattern() click to toggle source

PHI-BLAST pattern : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
160 def pattern;      @parameters['pattern'];          end
sc_match() click to toggle source

Match score for NT (-r) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
150 def sc_match;     @parameters['sc-match'];         end
sc_mismatch() click to toggle source

Mismatch score for NT (-q) : shortcuts for @parameters

    # File lib/bio/appl/blast/report.rb
152 def sc_mismatch;  @parameters['sc-mismatch'];      end
statistics() click to toggle source

Returns a Hash containing execution statistics of the last iteration. Valid keys are: 'db-num', 'db-len', 'hsp-len', 'eff-space', 'kappa', 'lambda', 'entropy' Shortcut for the last iteration's statistics.

    # File lib/bio/appl/blast/report.rb
190 def statistics
191   @iterations.last.statistics
192 end

Private Instance Methods

auto_parse(data) click to toggle source
   # File lib/bio/appl/blast/report.rb
70 def auto_parse(data)
71   if /<?xml/.match(data[/.*/])
72     if defined? xmlparser_parse
73       xmlparser_parse(data)
74       @reports = blastxml_split_reports
75     else
76       rexml_parse(data)
77       @reports = blastxml_split_reports
78     end
79   else
80     tab_parse(data)
81   end
82 end
blastxml_split_reports() click to toggle source

(private method) In new BLAST XML (blastall >= 2.2.14), results of multiple queries are stored in <Iteration>. This method splits iterations into multiple Bio::Blast objects and returns them as an array.

    # File lib/bio/appl/blast/report.rb
422 def blastxml_split_reports
423   unless self.iterations.find { |iter|
424       iter.query_id || iter.query_def || iter.query_len
425     } then
426     # traditional BLAST XML format, or blastpgp result.
427     return nil
428   end
429 
430   # new BLAST XML format (blastall 2.2.14 or later)
431   origin = self
432   reports = []
433   prev_iternum = 0
434   firsttime = true
435 
436   orig_iters = self.iterations
437   orig_iters.each do |iter|
438     blast = self.class.new(nil, false)
439     # When no hits found, the iteration is skipped in NCBI BLAST XML.
440     # So, filled with empty report object.
441     if prev_iternum + 1 < iter.num then
442       ((prev_iternum + 1)...(iter.num)).each do |num|
443         empty_i = Iteration.new
444         empty_i.num = num
445         empty_i.instance_eval {
446           if firsttime then
447             @query_id  = origin.query_id
448             @query_def = origin.query_def
449             @query_len = origin.query_len
450             firsttime = false
451           end
452         }
453         empty = self.class.new(nil, false)
454         empty.instance_eval {
455           # queriy_* are copied from the empty_i
456           @query_id  = empty_i.query_id
457           @query_def = empty_i.query_def
458           @query_len = empty_i.query_len
459           # others are copied from the origin
460           @program   = origin.program
461           @version   = origin.version
462           @reference = origin.reference
463           @db        = origin.db
464           @parameters.update(origin.parameters)
465           # the empty_i is added to the iterations
466           @iterations.push empty_i
467         }
468         reports.push empty
469       end
470     end
471 
472     blast.instance_eval {
473       if firsttime then
474         @query_id  = origin.query_id
475         @query_def = origin.query_def
476         @query_len = origin.query_len
477         firsttime = false
478       end
479       # queriy_* are copied from the iter
480       @query_id  = iter.query_id if iter.query_id
481       @query_def = iter.query_def if iter.query_def
482       @query_len = iter.query_len if iter.query_len
483       # others are copied from the origin
484       @program   = origin.program
485       @version   = origin.version
486       @reference = origin.reference
487       @db        = origin.db
488       @parameters.update(origin.parameters)
489       # rewrites hit's query_id, query_def, query_len
490       iter.hits.each do |h|
491         h.query_id  = @query_id
492         h.query_def = @query_def
493         h.query_len = @query_len
494       end
495       # the iter is added to the iterations
496       @iterations.push iter
497     }
498 
499     prev_iternum = iter.num
500     reports.push blast
501   end #orig_iters.each
502 
503   # This object's iterations is set as first report's iterations
504   @iterations.clear
505   if rep = reports.first then
506     @iterations = rep.iterations
507   end
508 
509   return reports
510 end
rexml_parse(xml) click to toggle source
   # File lib/bio/appl/blast/rexml.rb
25 def rexml_parse(xml)
26   dom = REXML::Document.new(xml)
27   rexml_parse_program(dom)
28   dom.elements.each("*//Iteration") do |e|
29     @iterations.push(rexml_parse_iteration(e))
30   end
31 end
rexml_parse_hit(e) click to toggle source
    # File lib/bio/appl/blast/rexml.rb
 87 def rexml_parse_hit(e)
 88   hit = Hit.new
 89   hash = {}
 90   hit.query_id = @query_id
 91   hit.query_def = @query_def
 92   hit.query_len = @query_len
 93   e.elements.each do |h|
 94     case h.name
 95     when 'Hit_hsps'
 96       h.elements.each("Hsp") do |s|
 97         hit.hsps.push(rexml_parse_hsp(s))
 98       end
 99     else
100       hash[h.name] = h.text
101     end
102   end
103   hit.num         = hash['Hit_num'].to_i
104   hit.hit_id      = hash['Hit_id']
105   hit.len         = hash['Hit_len'].to_i
106   hit.definition  = hash['Hit_def']
107   hit.accession   = hash['Hit_accession']
108   return hit
109 end
rexml_parse_hsp(e) click to toggle source
    # File lib/bio/appl/blast/rexml.rb
111 def rexml_parse_hsp(e)
112   hsp = Hsp.new
113   hash = {}
114   e.each_element_with_text do |h|
115     hash[h.name] = h.text
116   end
117   hsp.num                 = hash['Hsp_num'].to_i
118   hsp.bit_score           = hash['Hsp_bit-score'].to_f
119   hsp.score               = hash['Hsp_score'].to_i
120   hsp.evalue              = hash['Hsp_evalue'].to_f
121   hsp.query_from          = hash['Hsp_query-from'].to_i
122   hsp.query_to            = hash['Hsp_query-to'].to_i
123   hsp.hit_from            = hash['Hsp_hit-from'].to_i
124   hsp.hit_to              = hash['Hsp_hit-to'].to_i
125   hsp.pattern_from        = hash['Hsp_pattern-from'].to_i
126   hsp.pattern_to          = hash['Hsp_pattern-to'].to_i
127   hsp.query_frame         = hash['Hsp_query-frame'].to_i
128   hsp.hit_frame           = hash['Hsp_hit-frame'].to_i
129   hsp.identity            = hash['Hsp_identity'].to_i
130   hsp.positive            = hash['Hsp_positive'].to_i
131   hsp.gaps                = hash['Hsp_gaps'].to_i
132   hsp.align_len           = hash['Hsp_align-len'].to_i
133   hsp.density             = hash['Hsp_density'].to_i
134   hsp.qseq                = hash['Hsp_qseq']
135   hsp.hseq                = hash['Hsp_hseq']
136   hsp.midline             = hash['Hsp_midline']
137   return hsp
138 end
rexml_parse_iteration(e) click to toggle source
   # File lib/bio/appl/blast/rexml.rb
55 def rexml_parse_iteration(e)
56   iteration = Iteration.new
57   e.elements.each do |i|
58     case i.name
59     when 'Iteration_iter-num'
60       iteration.num = i.text.to_i
61     when 'Iteration_hits'
62       i.elements.each("Hit") do |h|
63         iteration.hits.push(rexml_parse_hit(h))
64       end
65     when 'Iteration_message'
66       iteration.message = i.text
67     when 'Iteration_stat'
68       i.elements["Statistics"].each_element_with_text do |s|
69         k = s.name.sub(/Statistics_/, '')
70         v = s.text =~ /\D/ ? s.text.to_f : s.text.to_i
71         iteration.statistics[k] = v
72       end
73 
74     # for new BLAST XML format
75     when 'Iteration_query-ID'
76       iteration.query_id = i.text
77     when 'Iteration_query-def'
78       iteration.query_def = i.text
79     when 'Iteration_query-len'
80       iteration.query_len = i.text.to_i
81     end
82   end #case i.name
83 
84   return iteration
85 end
rexml_parse_program(dom) click to toggle source
   # File lib/bio/appl/blast/rexml.rb
33 def rexml_parse_program(dom)
34   hash = {}
35   dom.root.each_element_with_text do |e|
36     name, text = e.name, e.text
37     case name
38     when 'BlastOutput_param'
39       e.elements["Parameters"].each_element_with_text do |p|
40         xml_set_parameter(p.name, p.text)
41       end
42     else
43       hash[name] = text if text.strip.size > 0
44     end
45   end
46   @program        = hash['BlastOutput_program']
47   @version        = hash['BlastOutput_version']
48   @reference      = hash['BlastOutput_reference']
49   @db             = hash['BlastOutput_db']
50   @query_id       = hash['BlastOutput_query-ID']
51   @query_def      = hash['BlastOutput_query-def']
52   @query_len      = hash['BlastOutput_query-len'].to_i
53 end
tab_parse(data) click to toggle source
   # File lib/bio/appl/blast/format8.rb
20 def tab_parse(data)
21   iteration = Iteration.new
22   @iterations.push(iteration)
23   @query_id = @query_def = data[/\S+/]
24 
25   query_prev = ''
26   target_prev = ''
27   hit_num = 1
28   hsp_num = 1
29   hit = ''
30   data.each_line do |line|
31     ary = line.chomp.split("\t")
32     query_id, target_id, hsp = tab_parse_hsp(ary)
33     if query_prev != query_id or target_prev != target_id
34       hit = Hit.new
35       hit.num = hit_num
36       hit_num += 1
37       hit.query_id = hit.query_def = query_id
38       hit.accession = hit.definition = target_id
39       iteration.hits.push(hit)
40       hsp_num = 1
41     end
42     hsp.num = hsp_num
43     hsp_num += 1
44     hit.hsps.push(hsp)
45     query_prev = query_id
46     target_prev = target_id
47   end
48 end
tab_parse_hsp(ary) click to toggle source
   # File lib/bio/appl/blast/format8.rb
50 def tab_parse_hsp(ary)
51   query_id, target_id,
52     percent_identity,
53     align_len,
54     mismatch_count,
55     gaps,
56     query_from,
57     query_to,
58     hit_from,
59     hit_to,
60     evalue,
61     bit_score = *ary
62 
63   hsp = Hsp.new
64   hsp.align_len           = align_len.to_i
65   hsp.gaps                = gaps.to_i
66   hsp.query_from          = query_from.to_i
67   hsp.query_to            = query_to.to_i
68   hsp.hit_from            = hit_from.to_i
69   hsp.hit_to              = hit_to.to_i
70   hsp.evalue              = evalue.strip.to_f
71   hsp.bit_score           = bit_score.to_f
72 
73   hsp.percent_identity    = percent_identity.to_f
74   hsp.mismatch_count      = mismatch_count.to_i
75 
76   return query_id, target_id, hsp
77 end
xml_set_parameter(key, val) click to toggle source

set parameter of the key as val

    # File lib/bio/appl/blast/report.rb
393 def xml_set_parameter(key, val)
394   #labels = {
395   #  'matrix'       => 'Parameters_matrix',
396   #  'expect'       => 'Parameters_expect',
397   #  'include'      => 'Parameters_include',
398   #  'sc-match'     => 'Parameters_sc-match',
399   #  'sc-mismatch'  => 'Parameters_sc-mismatch',
400   #  'gap-open'     => 'Parameters_gap-open',
401   #  'gap-extend'   => 'Parameters_gap-extend',
402   #  'filter'       => 'Parameters_filter',
403   #  'pattern'      => 'Parameters_pattern',
404   #  'entrez-query' => 'Parameters_entrez-query',
405   #}
406   k = key.sub(/\AParameters\_/, '')
407   @parameters[k] =
408     case k
409     when 'expect', 'include'
410       val.to_f
411     when /\Agap\-/, /\Asc\-/
412       val.to_i
413     else
414       val
415     end
416 end