class BioDSL::TrimSeq
Trim
sequence ends removing residues with a low quality score.¶ ↑
trim_seq
removes subquality residues from the ends of sequences in the stream based on quality SCORES in a FASTQ type quality score string. Trimming progresses until a stretch, specified with the length_min
option, is found thus preventing premature termination of the trimming by e.g. a single good quality residue at the end. It is possible, using the mode
option to indicate if the sequence should be trimmed from the left or right end or both (default=:both).
Usage¶ ↑
trim_seq([quality_min: <uint>[, length_min: <uint> [, mode: <:left|:right|:both>]]])
Options¶ ↑
-
quality_min: <uint> - Minimum quality (default=20).
-
length_min: <uint> - Minimum stretch length (default=3).
-
mode: <string> -
Trim
mode :left|:right|:both (default=:both).
Examples¶ ↑
Consider the following FASTQ entry in the file test.fq:
@test gatcgatcgtacgagcagcatctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcat + @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJI
To trim both ends simply do:
BD.new.read_fastq(input: "test.fq").trim_seq.trim_seq.run SEQ_NAME: test SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcat SEQ_LEN: 62 SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJI ---
Use the quality_min
option to change the minimum value to discard:
BD.new. read_fastq(input: "test.fq"). trim_seq(quality_min: 25). trim_seq. run SEQ_NAME: test SEQ: cgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag SEQ_LEN: 57 SCORES: YZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh ---
To trim the left end only (use :rigth for right end only), do:
BD.new.read_fastq(input: "test.fq").trim_seq(mode: :left).trim_seq.run SEQ_NAME: test SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag SEQ_LEN: 62 SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh ---
To increase the length of stretch of good quality residues to match, use the length_min
option:
BD.new.read_fastq(input: "test.fq").trim_seq(length_min: 4).trim_seq.run SEQ_NAME: test SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtct SEQ_LEN: 42 SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUT ---
Constants
- STATS
Public Class Methods
Constructor for the TrimSeq
class.
@param [Hash] options Options hash.
@option options [Integer] :quality_min
TrimSeq minimum quality (default=20).
@option options [Symbol] :mode
TrimSeq mode (default=:both).
@option options [Integer] :length_min
TrimSeq stretch length triggering trim (default=3).
@return [Proc] Returns the trim_seq
command lambda.
@return [TrimSeq] Returns an instance of the TrimSeq
class.
# File lib/BioDSL/commands/trim_seq.rb, line 123 def initialize(options) @options = options check_options defaults @mode = @options[:mode].to_sym @min = @options[:quality_min] @len = @options[:length_min] end
Public Instance Methods
Return a lambda for the trim_seq
command.
@return [Proc] Returns the trim_seq
command lambda.
# File lib/BioDSL/commands/trim_seq.rb, line 137 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 trim_seq(record) if record[:SEQ] && record[:SCORES] output << record @status[:records_out] += 1 end end end
Private Instance Methods
Check the options.
# File lib/BioDSL/commands/trim_seq.rb, line 156 def check_options options_allowed(@options, :quality_min, :length_min, :mode) options_allowed_values(@options, mode: [:left, :right, :both]) options_assert(@options, ':quality_min >= 0') options_assert(@options, ':quality_min <= 40') options_assert(@options, ':length_min > 0') end
Set defaul options.
# File lib/BioDSL/commands/trim_seq.rb, line 165 def defaults @options[:quality_min] ||= 20 @options[:mode] ||= :both @options[:length_min] ||= 3 end
Trim
sequence in a given record with sequence info.
@param record [Hash] BioDSL
record
# File lib/BioDSL/commands/trim_seq.rb, line 174 def trim_seq(record) entry = BioDSL::Seq.new_bp(record) @status[:sequences_in] += 1 @status[:residues_in] += entry.length case @mode when :both then entry.quality_trim!(@min, @len) when :left then entry.quality_trim_left!(@min, @len) when :right then entry.quality_trim_right!(@min, @len) end @status[:sequences_out] += 1 @status[:residues_out] += entry.length record.merge! entry.to_bp end