class BioDSL::UchimeRef
Run uchime_ref on sequences in the stream.¶ ↑
This is a wrapper for the usearch
tool to run the program uchime_ref. Basically sequence type records are searched against a reference database or non-chimeric sequences, and chimirec sequences are filtered out so only non-chimeric sequences are output.
Please refer to the manual:
drive5.com/usearch/manual/uchime_ref.html
Usearch
7.0 must be installed for usearch
to work. Read more here:
Usage¶ ↑
uchime_ref(<database: <file>[cpus: <uint>])
Options¶ ↑
-
database: <file> - Database to search (in FASTA format).
-
cpus: <uint> - Number of CPU cores to use (default=1).
Examples¶ ↑
Constants
- STATS
Public Class Methods
Constructor for UchimeRef
.
@param options [Hash] Options hash. @option options [String] :database @option options [Integer] :cpus
@return [UchimeRef] Class instance.
# File lib/BioDSL/commands/uchime_ref.rb, line 70 def initialize(options) @options = options aux_exist('usearch') check_options @options[:cpus] ||= 1 @options[:strand] ||= 'plus' # This option cant be changed in usearch7.0 end
Public Instance Methods
Return command lambda for uchime_ref.
@return [Proc] Command
lambda.
# File lib/BioDSL/commands/uchime_ref.rb, line 81 def lmb lambda do |input, output, status| status_init(status, STATS) TmpDir.create('input', 'output') do |tmp_in, tmp_out| process_input(input, output, tmp_in) run_uchime_ref(tmp_in, tmp_out) process_output(output, tmp_out) end end end
Private Instance Methods
Check options.
# File lib/BioDSL/commands/uchime_ref.rb, line 97 def check_options options_allowed(@options, :database, :cpus) options_required(@options, :database) options_files_exist(@options, :database) options_assert(@options, ':cpus >= 1') options_assert(@options, ":cpus <= #{BioDSL::Config::CORES_MAX}") end
Process input stream and save records with sequences to a temporary FASTA file or emit non-sequence containing records to the output stream.
@param input [Enumerator] Input stream. @param output [Enumerator::Yielder] Output stream. @param tmp_in [String] Path to temporary FASTA file.
# File lib/BioDSL/commands/uchime_ref.rb, line 111 def process_input(input, output, tmp_in) BioDSL::Fasta.open(tmp_in, 'w') do |ios| input.each_with_index do |record, i| @status[:records_in] += 1 if record[:SEQ] @status[:sequences_in] += 1 @status[:residues_in] += record[:SEQ].length seq_name = record[:SEQ_NAME] || i.to_s entry = BioDSL::Seq.new(seq_name: seq_name, seq: record[:SEQ]) ios.puts entry.to_fasta else output << record @status[:records_out] += 1 end end end end
Process uchime_ref output data and emit to output stream.
@param output [Enumerator::Yielder] Output stream. @param tmp_out [String] Path to file with uchime_ref data.
# File lib/BioDSL/commands/uchime_ref.rb, line 157 def process_output(output, tmp_out) Fasta.open(tmp_out) do |ios| ios.each do |entry| record = entry.to_bp output << record @status[:sequences_out] += 1 @status[:residues_out] += entry.length @status[:records_out] += 1 end end end
Run uchime_ref on input file and save result input file.
@param tmp_in [String] Path to input file. @param tmp_out [String] Path to output file.
@raise [BioDSL::UsearchError] If command fails.
# File lib/BioDSL/commands/uchime_ref.rb, line 138 def run_uchime_ref(tmp_in, tmp_out) uchime_opts = { input: tmp_in, output: tmp_out, database: @options[:database], strand: @options[:strand], cpus: @options[:cpus], verbose: @options[:verbose] } BioDSL::Usearch.uchime_ref(uchime_opts) rescue BioDSL::UsearchError => e raise unless e.message =~ /Empty input file/ end