class BioDSL::CollapseOtus

Collapse OTUs based on identicial taxonomy strings.

collapse_otus collapses OTUs in OTU style records if the TAXONOMY string is redundant. At the same time the sample counts (_COUNT) is incremented the collapsed OTUs.

Usage

collapse_otus

Options

Examples

Here is an OTU table with four rows, one of which has a redundant Taxonomy string:

BD.new.read_table(input: "otu_table.txt").dump.run

{:OTU=>"OTU_1",
 :CM1_COUNT=>881,
 :CM10_COUNT=>234,
 :TAXONOMY=>
  "Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100); \
  Leuconostocaceae(100);Leuconostoc(100)"}
{:OTU=>"OTU_0",
 :CM1_COUNT=>3352,
 :CM10_COUNT=>4329,
 :TAXONOMY=>
  "Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100); \
  Streptococcaceae(100);Lactococcus(100)"}
{:OTU=>"OTU_5",
 :CM1_COUNT=>5,
 :CM10_COUNT=>0,
 :TAXONOMY=>
  "Bacteria(100);Proteobacteria(100);Gammaproteobacteria(100); \
  Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100)"}
{:OTU=>"OTU_3",
 :CM1_COUNT=>228,
 :CM10_COUNT=>200,
 :TAXONOMY=>
  "Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100); \
  Streptococcaceae(100);Lactococcus(100)"}

In order to collapse the redudant OTU simply run the stream through collapse_otus:

BD.new.read_table(input: "otu_table.txt").collapse_otus.dump.run

{:OTU=>"OTU_1",
 :CM1_COUNT=>881,
 :CM10_COUNT=>234,
 :TAXONOMY=>
  "Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100); \
  Leuconostocaceae(100);Leuconostoc(100)"}
{:OTU=>"OTU_0",
 :CM1_COUNT=>3580,
 :CM10_COUNT=>4529,
 :TAXONOMY=>
  "Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100); \
  Streptococcaceae(100);Lactococcus(100)"}
{:OTU=>"OTU_5",
 :CM1_COUNT=>5,
 :CM10_COUNT=>0,
 :TAXONOMY=>
  "Bacteria(100);Proteobacteria(100);Gammaproteobacteria(100); \
  Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100)"}

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for CollapseOtus.

@param options [Hash] Options Hash.

# File lib/BioDSL/commands/collapse_otus.rb, line 102
def initialize(options)
  @options = options

  check_options
end

Public Instance Methods

lmb() click to toggle source

Return the CollapseOtus command lambda.

@return [Proc] Lambda for the command.

# File lib/BioDSL/commands/collapse_otus.rb, line 111
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    hash = {}

    input.each do |record|
      @status[:records_in] += 1

      if record[:TAXONOMY]
        @status[:otus_in] += 1

        collapse_tax(hash, record)
      else
        output << record
        @status[:records_out] += 1
      end
    end

    write_tax(hash, output)
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/collapse_otus.rb, line 137
def check_options
  options_allowed(@options, nil)
end
collapse_tax(hash, record) click to toggle source

Collapse identical taxonomies by removing duplicates and adding their counts.

@param hash [Hash] Hash with taxonomy records. @param record [Hash] BioDSL record with taxonomy info.

# File lib/BioDSL/commands/collapse_otus.rb, line 146
def collapse_tax(hash, record)
  key = record[:TAXONOMY].gsub(/\(\d+\)/, '').to_sym

  if hash.key? key
    record.each do |k, v|
      hash[key][k] += v if k[-6..-1] == '_COUNT'
    end
  else
    hash[key] = record
  end
end
write_tax(hash, output) click to toggle source

Output collapsed taxonomy records.

@param hash [Hash] Hash with taxonomy records. @param output [Enumerator::Yielder] Output stream.

# File lib/BioDSL/commands/collapse_otus.rb, line 162
def write_tax(hash, output)
  hash.each_value do |record|
    output << record
    @status[:otus_out] += 1
    @status[:records_out] += 1
  end
end