class BioDSL::MeanScores
Calculate the mean or local mean of quality SCORES in the stream.¶ ↑
mean_scores
calculates either the global or local mean value or quality SCORES in the stream. The quality SCORES are encoded Phred style in character string.
The global (default) behaviour calculates the SCORES_MEAN as the sum of all the scores over the length of the SCORES string.
The local means SCORES_MEAN_LOCAL are calculated using means from a sliding window, where the smallest mean is returned.
Thus, subquality records, with either an overall low mean quality or with local dip in quality, can be filtered using grab
.
Usage¶ ↑
mean_scores([local: <bool>[, window_size: <uint>]])
Options¶ ↑
-
local: <bool> - Calculate local mean score (default=false).
-
window_size: <uint> - Size of sliding window (defaul=5).
Examples¶ ↑
Consider the following FASTQ entry in the file test.fq:
@HWI-EAS157_20FFGAAXX:2:1:888:434 TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG + BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII
The values of the scores in decimal are:
SCORES: 33;34;35;36;37;38;39;40;40;40;40;40;40;40;11;11;11;11;11;40;37; 37;40;40;40;40;40;40;40;40;40;40;40;
To calculate the mean score do:
BD.new.read_fastq(input: "test.fq").mean_scores.dump.run {:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434", :SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG", :SEQ_LEN=>33, :SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII", :SCORES_MEAN=>34.58}
To calculate local means for a sliding window, do:
BD.new.read_fastq(input: "test.fq").mean_scores(local: true).dump.run {:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434", :SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG", :SEQ_LEN=>33, :SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII", :SCORES_MEAN_LOCAL=>11.0}
Which indicates a local minimum was located at the stretch of ,,,,, = 11+11+11+11+11 / 5 = 11.0
Constants
- STATS
Public Class Methods
Constructor for MeanScores
.
@param options [Hash] Options hash. @option options [Boolean] :local @option options [Fixnum] :window_size
@return [MeanScores] Class instance.
# File lib/BioDSL/commands/mean_scores.rb, line 100 def initialize(options) @options = options @min = Float::INFINITY @max = 0 @sum = 0 @count = 0 check_options defaults end
Public Instance Methods
Return command lambda for mean_scores.
@return [Proc] Command
lambda.
# File lib/BioDSL/commands/mean_scores.rb, line 114 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 calc_mean(record) if record[:SCORES] && record[:SCORES].length > 0 output << record @status[:records_out] += 1 end @status[:mean_mean] = (@sum.to_f / @count).round(2) end end
Private Instance Methods
Calculate the mean score for a given record and record count, sum, min and max.
@param record [Hash] BioDSL
record.
# File lib/BioDSL/commands/mean_scores.rb, line 151 def calc_mean(record) entry = BioDSL::Seq.new_bp(record) if @options[:local] mean = entry.scores_mean_local(@options[:window_size]).round(2) record[:SCORES_MEAN_LOCAL] = mean else mean = entry.scores_mean.round(2) record[:SCORES_MEAN] = mean end @sum += mean @status[:min_mean] = mean if mean < @status[:min_mean] @status[:max_mean] = mean if mean > @status[:max_mean] @count += 1 end
Check options
# File lib/BioDSL/commands/mean_scores.rb, line 135 def check_options options_allowed(@options, :local, :window_size) options_tie(@options, window_size: :local) options_allowed_values(@options, local: [true, false]) options_assert(@options, ':window_size > 1') end
Set default options.
# File lib/BioDSL/commands/mean_scores.rb, line 143 def defaults @options[:window_size] ||= 5 end