class BioDSL::UchimeRef

Run uchime_ref on sequences in the stream.

This is a wrapper for the usearch tool to run the program uchime_ref. Basically sequence type records are searched against a reference database or non-chimeric sequences, and chimirec sequences are filtered out so only non-chimeric sequences are output.

Please refer to the manual:

drive5.com/usearch/manual/uchime_ref.html

Usearch 7.0 must be installed for usearch to work. Read more here:

www.drive5.com/usearch/

Usage

uchime_ref(<database: <file>[cpus: <uint>])

Options

Examples

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for UchimeRef.

@param options [Hash] Options hash. @option options [String] :database @option options [Integer] :cpus

@return [UchimeRef] Class instance.

# File lib/BioDSL/commands/uchime_ref.rb, line 70
def initialize(options)
  @options = options
  aux_exist('usearch')
  check_options
  @options[:cpus] ||= 1
  @options[:strand] ||= 'plus' # This option cant be changed in usearch7.0
end

Public Instance Methods

lmb() click to toggle source

Return command lambda for uchime_ref.

@return [Proc] Command lambda.

# File lib/BioDSL/commands/uchime_ref.rb, line 81
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    TmpDir.create('input', 'output') do |tmp_in, tmp_out|
      process_input(input, output, tmp_in)
      run_uchime_ref(tmp_in, tmp_out)

      process_output(output, tmp_out)
    end
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/uchime_ref.rb, line 97
def check_options
  options_allowed(@options, :database, :cpus)
  options_required(@options, :database)
  options_files_exist(@options, :database)
  options_assert(@options, ':cpus >= 1')
  options_assert(@options, ":cpus <= #{BioDSL::Config::CORES_MAX}")
end
process_input(input, output, tmp_in) click to toggle source

Process input stream and save records with sequences to a temporary FASTA file or emit non-sequence containing records to the output stream.

@param input [Enumerator] Input stream. @param output [Enumerator::Yielder] Output stream. @param tmp_in [String] Path to temporary FASTA file.

# File lib/BioDSL/commands/uchime_ref.rb, line 111
def process_input(input, output, tmp_in)
  BioDSL::Fasta.open(tmp_in, 'w') do |ios|
    input.each_with_index do |record, i|
      @status[:records_in] += 1

      if record[:SEQ]
        @status[:sequences_in] += 1
        @status[:residues_in] += record[:SEQ].length
        seq_name = record[:SEQ_NAME] || i.to_s

        entry = BioDSL::Seq.new(seq_name: seq_name, seq: record[:SEQ])

        ios.puts entry.to_fasta
      else
        output << record
        @status[:records_out] += 1
      end
    end
  end
end
process_output(output, tmp_out) click to toggle source

Process uchime_ref output data and emit to output stream.

@param output [Enumerator::Yielder] Output stream. @param tmp_out [String] Path to file with uchime_ref data.

# File lib/BioDSL/commands/uchime_ref.rb, line 157
def process_output(output, tmp_out)
  Fasta.open(tmp_out) do |ios|
    ios.each do |entry|
      record = entry.to_bp

      output << record
      @status[:sequences_out] += 1
      @status[:residues_out] += entry.length
      @status[:records_out] += 1
    end
  end
end
run_uchime_ref(tmp_in, tmp_out) click to toggle source

Run uchime_ref on input file and save result input file.

@param tmp_in [String] Path to input file. @param tmp_out [String] Path to output file.

@raise [BioDSL::UsearchError] If command fails.

# File lib/BioDSL/commands/uchime_ref.rb, line 138
def run_uchime_ref(tmp_in, tmp_out)
  uchime_opts = {
    input: tmp_in,
    output: tmp_out,
    database: @options[:database],
    strand: @options[:strand],
    cpus: @options[:cpus],
    verbose: @options[:verbose]
  }

  BioDSL::Usearch.uchime_ref(uchime_opts)
rescue BioDSL::UsearchError => e
  raise unless e.message =~ /Empty input file/
end