class BioDSL::CollectOtus
Collect OTU data from records in the stream.¶ ↑
collect_otus
count the number of times each OTU is found in a set of samples. OTUs are given by the :S_ID key and samples by the :SAMPLE key. If a :SEQ_COUNT key is present it will be used to increment the OTU count, allowing for dereplicated sequences to be used.
Usage¶ ↑
collect_otus()
Options¶ ↑
Examples¶ ↑
Constants
- STATS
Public Class Methods
Constructor for CollectOtus
.
@param options [Hash] Options hash.
# File lib/BioDSL/commands/collect_otus.rb, line 52 def initialize(options) @options = options check_options end
Public Instance Methods
Return lambda for CollectOtus
command.
@return [Proc] Command
lambda.
# File lib/BioDSL/commands/collect_otus.rb, line 61 def lmb lambda do |input, output, status| status_init(status, STATS) count_hash = process_input(input, output) samples = collect_samples(count_hash) process_output(count_hash, samples, output) end end
Private Instance Methods
Add to the count_hash a given record.
@param count_hash [Hash] Hash with sample counts @param record [Hash] BioDSL
record with sample and count.
# File lib/BioDSL/commands/collect_otus.rb, line 105 def add_to_count_hash(count_hash, record) id = record[:S_ID].to_sym sample = record[:SAMPLE].upcase.to_sym count_hash[id][sample] += (record[:SEQ_COUNT] || 1) @status[:hits_in] += 1 end
Check options.
# File lib/BioDSL/commands/collect_otus.rb, line 74 def check_options options_allowed(@options, nil) end
Collect all samples in the count_hash into a sorted set.
@param count_hash [Hash] Hash with sample counts.
@return [SortedSet] Sample names.
# File lib/BioDSL/commands/collect_otus.rb, line 117 def collect_samples(count_hash) samples = SortedSet.new count_hash.values.each do |value| value.keys.map { |key| samples << key } end samples end
Read input stream and for all hit records add these to the count hash.
@param input [Enumerator] Input stream. @param output [Enumerator::Yielder] Output stream.
@return [Hash] Returns the count_hash.
# File lib/BioDSL/commands/collect_otus.rb, line 84 def process_input(input, output) count_hash = Hash.new { |h, k| h[k] = Hash.new(0) } input.each do |record| @status[:records_in] += 1 if record[:TYPE] && record[:TYPE] == 'H' add_to_count_hash(count_hash, record) end output << record @status[:records_out] += 1 end count_hash end
Output all samples and counts from the count_hash and samples to the output stream.
@param count_hash [Hash] Hash with sample counts @param samples [SortedSet] Set with sample names. @param output [Enumerator::Yielder] Output stream.
# File lib/BioDSL/commands/collect_otus.rb, line 133 def process_output(count_hash, samples, output) count_hash.each do |key, value| record = {} record[:RECORD_TYPE] = 'OTU' record[:OTU] = key.to_s samples.each do |sample| record["#{sample}_COUNT".to_sym] = value[sample] end output << record @status[:hits_out] += 1 @status[:records_out] += 1 end end