class BioDSL::CollectOtus

Collect OTU data from records in the stream.

collect_otus count the number of times each OTU is found in a set of samples. OTUs are given by the :S_ID key and samples by the :SAMPLE key. If a :SEQ_COUNT key is present it will be used to increment the OTU count, allowing for dereplicated sequences to be used.

Usage

collect_otus()

Options

Examples

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for CollectOtus.

@param options [Hash] Options hash.

# File lib/BioDSL/commands/collect_otus.rb, line 52
def initialize(options)
  @options = options

  check_options
end

Public Instance Methods

lmb() click to toggle source

Return lambda for CollectOtus command.

@return [Proc] Command lambda.

# File lib/BioDSL/commands/collect_otus.rb, line 61
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    count_hash = process_input(input, output)
    samples    = collect_samples(count_hash)
    process_output(count_hash, samples, output)
  end
end

Private Instance Methods

add_to_count_hash(count_hash, record) click to toggle source

Add to the count_hash a given record.

@param count_hash [Hash] Hash with sample counts @param record [Hash] BioDSL record with sample and count.

# File lib/BioDSL/commands/collect_otus.rb, line 105
def add_to_count_hash(count_hash, record)
  id     = record[:S_ID].to_sym
  sample = record[:SAMPLE].upcase.to_sym
  count_hash[id][sample] += (record[:SEQ_COUNT] || 1)
  @status[:hits_in] += 1
end
check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/collect_otus.rb, line 74
def check_options
  options_allowed(@options, nil)
end
collect_samples(count_hash) click to toggle source

Collect all samples in the count_hash into a sorted set.

@param count_hash [Hash] Hash with sample counts.

@return [SortedSet] Sample names.

# File lib/BioDSL/commands/collect_otus.rb, line 117
def collect_samples(count_hash)
  samples = SortedSet.new

  count_hash.values.each do |value|
    value.keys.map { |key| samples << key }
  end

  samples
end
process_input(input, output) click to toggle source

Read input stream and for all hit records add these to the count hash.

@param input [Enumerator] Input stream. @param output [Enumerator::Yielder] Output stream.

@return [Hash] Returns the count_hash.

# File lib/BioDSL/commands/collect_otus.rb, line 84
def process_input(input, output)
  count_hash = Hash.new { |h, k| h[k] = Hash.new(0) }

  input.each do |record|
    @status[:records_in] += 1

    if record[:TYPE] && record[:TYPE] == 'H'
      add_to_count_hash(count_hash, record)
    end

    output << record
    @status[:records_out] += 1
  end

  count_hash
end
process_output(count_hash, samples, output) click to toggle source

Output all samples and counts from the count_hash and samples to the output stream.

@param count_hash [Hash] Hash with sample counts @param samples [SortedSet] Set with sample names. @param output [Enumerator::Yielder] Output stream.

# File lib/BioDSL/commands/collect_otus.rb, line 133
def process_output(count_hash, samples, output)
  count_hash.each do |key, value|
    record = {}
    record[:RECORD_TYPE] = 'OTU'
    record[:OTU]         = key.to_s

    samples.each do |sample|
      record["#{sample}_COUNT".to_sym] = value[sample]
    end

    output << record

    @status[:hits_out] += 1
    @status[:records_out] += 1
  end
end