class Bio::Blast

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

require 'bio'

# To run an actual BLAST analysis:
#   1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot',
                                         '-e 0.0001', 'genomenet')
#or:
local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

#   2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(sequence_text)

# Then, to parse the report, see Bio::Blast::Report

See also

References

Attributes

blastall[RW]

Full path for blastall. (default: 'blastall').

db[RW]

Database name (-d option for blastall)

filter[RW]

Filter option for blastall -F (T or F).

format[RW]

Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].

matrix[RW]

Substitution matrix for blastall -M

options[R]

Options for blastall

output[R]

Returns a String containing blast execution output in as is the Bio::Blast#format.

parser[W]
program[RW]

Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx

server[R]

Server to submit the BLASTs to

Public Class Methods

local(program, db, options = '', blastall = nil) click to toggle source

This is a shortcut for Bio::Blast.new:

Bio::Blast.local(program, database, options)

is equivalent to

Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the local database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)

Returns

Bio::Blast factory object

   # File lib/bio/appl/blast.rb
78 def self.local(program, db, options = '', blastall = nil)
79   f = self.new(program, db, options, 'local')
80   if blastall then
81     f.blastall = blastall
82   end
83   f
84 end
new(program, db, opt = [], server = 'local') click to toggle source

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the (local or remote) database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (e.g. 'genomenet'; DEFAULT = 'local')

Returns

Bio::Blast factory object

    # File lib/bio/appl/blast.rb
316 def initialize(program, db, opt = [], server = 'local')
317   @program  = program
318   @db       = db
319 
320   @blastall = 'blastall'
321   @matrix   = nil
322   @filter   = nil
323 
324   @output   = ''
325   @parser   = nil
326   @format   = nil
327 
328   @options = set_options(opt, program, db)
329   self.server = server
330 end
remote(program, db, option = '', server = 'genomenet') click to toggle source

Bio::Blast.remote does exactly the same as Bio::Blast.new, but sets the remote server 'genomenet' as its default.


Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the remote database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (DEFAULT = 'genomenet')

Returns

Bio::Blast factory object

   # File lib/bio/appl/blast.rb
96 def self.remote(program, db, option = '', server = 'genomenet')
97   self.new(program, db, option, server)
98 end
reports(input, parser = nil) { |e| ... } click to toggle source

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.

    # File lib/bio/appl/blast.rb
113 def self.reports(input, parser = nil)
114   begin
115     istr = input.to_str
116   rescue NoMethodError
117     istr = nil
118   end
119   if istr then
120     input = StringIO.new(istr)
121   end
122   raise 'unsupported input data type' unless input.respond_to?(:gets)
123 
124   # if proper parser is given, emulates old behavior.
125   case parser
126   when :xmlparser, :rexml
127     ff = Bio::FlatFile.new(Bio::Blast::Report, input)
128     if block_given? then
129       ff.each do |e|
130         yield e
131       end
132       return []
133     else
134       return ff.to_a
135     end
136   when :tab
137     istr = input.read unless istr
138     rep = Report.new(istr, parser)
139     if block_given? then
140       yield rep
141       return []
142     else
143       return [ rep ]
144     end
145   end
146 
147   # preparation of the new format autodetection rule if needed
148   if !defined?(@@reports_format_autodetection_rule) or
149       !@@reports_format_autodetection_rule then
150     regrule = Bio::FlatFile::AutoDetect::RuleRegexp
151     blastxml = regrule[ 'Bio::Blast::Report',
152                         /\<\!DOCTYPE BlastOutput PUBLIC / ]
153     blast    = regrule[ 'Bio::Blast::Default::Report',
154                         /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
155     tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
156                         /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
157     tab      = regrule[ 'Bio::Blast::Report_tab',
158                         /^([^\t]*\t){11}[^\t]*$/ ]
159     auto = Bio::FlatFile::AutoDetect[ blastxml,
160                                       blast,
161                                       tblast,
162                                       tab
163                                     ]
164     # sets priorities
165     blastxml.is_prior_to blast
166     blast.is_prior_to tblast
167     tblast.is_prior_to tab
168     # rehash
169     auto.rehash
170     @@report_format_autodetection_rule = auto
171   end
172 
173   # Creates a FlatFile object with dummy class
174   ff = Bio::FlatFile.new(Object, input)
175   ff.dbclass = nil
176 
177   # file format autodetection
178   3.times do
179     break if ff.eof? or
180       ff.autodetect(31, @@report_format_autodetection_rule)
181   end
182   # If format detection failed, assumed to be tabular (-m 8)
183   ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass
184 
185   if block_given? then
186     ff.each do |entry|
187       yield entry
188     end
189     ret = []
190   else
191     ret = ff.to_a
192   end
193   ret
194 end
reports_xml(input, parser = nil) { |r| ... } click to toggle source

Note that this is the old implementation of Bio::Blast.reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

Bio::Blast.reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or Bio::Blast.reports.


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.

    # File lib/bio/appl/blast.rb
219 def self.reports_xml(input, parser = nil)
220   ary = []
221   input.each_line("</BlastOutput>\n") do |xml|
222     xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
223     next if xml.empty?          # skip trailing no hits
224     rep = Report.new(xml, parser)
225     if rep.reports then
226       if block_given?
227         rep.reports.each { |r| yield r }
228       else
229         ary.concat rep.reports
230       end
231     else
232       if block_given?
233         yield rep
234       else
235         ary.push rep
236       end
237     end
238   end
239   return ary
240 end

Public Instance Methods

option() click to toggle source

Returns options of blastall

    # File lib/bio/appl/blast.rb
373 def option
374   # backward compatibility
375   Bio::Command.make_command_line(options)
376 end
option=(str) click to toggle source

Set options for blastall

    # File lib/bio/appl/blast.rb
379 def option=(str)
380   # backward compatibility
381   self.options = Shellwords.shellwords(str)
382 end
options=(ary) click to toggle source

Sets options for blastall

    # File lib/bio/appl/blast.rb
254 def options=(ary)
255   @options = set_options(ary)
256 end
query(query) click to toggle source

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

# example 1
seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
report = blast_factory.query(seq)

# example 2
str <<END_OF_FASTA
>lcl|MySequence
MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
END_OF_FASTA
report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)

Returns

a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.

    # File lib/bio/appl/blast.rb
357 def query(query)
358   case query
359   when Bio::Sequence
360     query = query.output(:fasta)
361   when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
362     query = query.to_fasta('query', 70)
363   else
364     query = query.to_s
365   end
366 
367   @output = self.__send__("exec_#{@server}", query)
368   report = parse_result(@output)
369   return report
370 end
server=(str) click to toggle source

Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.

    # File lib/bio/appl/blast.rb
264 def server=(str)
265   @server = str
266   begin
267     m = Bio::Blast::Remote.const_get(@server.capitalize)
268   rescue NameError
269     m = nil
270   end
271   if m and !(self.is_a?(m)) then
272     # lazy include Bio::Blast::Remote::XXX module
273     self.class.class_eval { include m }
274   end
275   return @server
276 end

Private Instance Methods

exec_genomenet_tab(query) click to toggle source

This method is obsolete.

Runs genomenet with '-m 8' option. Note that the format option is overwritten.

    # File lib/bio/appl/blast.rb
495 def exec_genomenet_tab(query)
496   warn "Bio::Blast#server=\"genomenet_tab\" is deprecated."
497   @format = 8
498   exec_genomenet(query)
499 end
exec_local(query) click to toggle source

Local execution of blastall

    # File lib/bio/appl/blast.rb
485 def exec_local(query)
486   cmd = make_command_line
487   @output = Bio::Command.query_command(cmd, query)
488   return @output
489 end
make_command_line() click to toggle source

makes command line.

    # File lib/bio/appl/blast.rb
478 def make_command_line
479   cmd = make_command_line_options
480   cmd.unshift @blastall
481   cmd
482 end
make_command_line_options() click to toggle source

returns an array containing NCBI BLAST options

    # File lib/bio/appl/blast.rb
455 def make_command_line_options
456   set_options
457   cmd = []
458   if @program
459     cmd.concat([ '-p', @program ])
460   end
461   if @db
462     cmd.concat([ '-d', @db ])
463   end
464   if @format
465     cmd.concat([ '-m', @format.to_s ])
466   end
467   if @matrix
468     cmd.concat([ '-M', @matrix ]) 
469   end
470   if @filter
471     cmd.concat([ '-F', @filter ]) 
472   end
473   ncbiopts = NCBIOptions.new(@options)
474   ncbiopts.make_command_line_options(cmd)
475 end
parse_result(str) click to toggle source

parses result

    # File lib/bio/appl/blast.rb
437 def parse_result(str)
438   if @format.to_i == 0 then
439     ary = Bio::FlatFile.open(Bio::Blast::Default::Report,
440                              StringIO.new(str)) { |ff| ff.to_a }
441     case ary.size
442     when 0
443       return nil
444     when 1
445       return ary[0]
446     else
447       return ary
448     end
449   else
450     Report.new(str, @parser)
451   end
452 end
set_options(opt = nil, program = nil, db = nil) click to toggle source
    # File lib/bio/appl/blast.rb
386 def set_options(opt = nil, program = nil, db = nil)
387   opt = @options unless opt
388 
389   # when opt is a String, splits to an array
390   begin
391     a = opt.to_ary
392   rescue NameError #NoMethodError
393     # backward compatibility
394     a = Shellwords.shellwords(opt)
395   end
396   ncbiopt = NCBIOptions.new(a)
397 
398   if fmt = ncbiopt.get('-m') then
399     @format = fmt.to_i
400   else
401     _ = Bio::Blast::Report #dummy to load XMLParser or REXML
402     if defined?(XMLParser) or defined?(REXML)
403       @format ||= 7
404     else
405       @format ||= 8
406     end
407   end
408 
409   mtrx = ncbiopt.get('-M')
410   @matrix = mtrx if mtrx
411   fltr = ncbiopt.get('-F')
412   @filter = fltr if fltr
413 
414   # special treatment for '-p'
415   if program then
416     @program = program
417     ncbiopt.delete('-p')
418   else
419     program = ncbiopt.get('-p')
420     @program = program if program
421   end
422 
423   # special treatment for '-d'
424   if db then
425     @db = db
426     ncbiopt.delete('-d')
427   else
428     db = ncbiopt.get('-d')
429     @db = db if db
430   end
431 
432   # returns an array of string containing options
433   return ncbiopt.options
434 end