class BioDSL::ReadFasta
Read FASTA entries from one or more files.¶ ↑
read_fasta
read in sequence entries from FASTA files. Each sequence entry consists of a sequence name prefixed by a ‘>’ followed by the sequence name on a line of its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting Biopiece record consists of the following record type:
{:SEQ_NAME=>"test", :SEQ=>"AGCATCGACTAGCAGCATTT", :SEQ_LEN=>20}
Input files may be compressed with gzip og bzip2.
For more about the FASTA format:
en.wikipedia.org/wiki/Fasta_format
Usage¶ ↑
read_fasta(input: <glob>[, first: <uint>|last: <uint>])
Options¶ ↑
-
input <glob> - Input file or file glob expression.
-
first <uint> - Only read in the first number of entries.
-
last <uint> - Only read in the last number of entries.
Examples¶ ↑
To read all FASTA entries from a file:
read_fasta(input: "test.fna")
To read all FASTA entries from a gzipped file:
read_fasta(input: "test.fna.gz")
To read in only 10 records from a FASTA file:
read_fasta(input: "test.fna", first: 10)
To read in the last 10 records from a FASTA file:
read_fasta(input: "test.fna", last: 10)
To read all FASTA entries from multiple files:
read_fasta(input: "test1.fna,test2.fna")
To read FASTA entries from multiple files using a glob expression:
read_fasta(input: "*.fna")
Constants
- STATS
Public Class Methods
Constructor for the ReadFasta
class.
@param [Hash] options Options hash. @option options [String, Array] :input String
or Array
with glob
expressions.
@option options [Integer] :first Dump
first number of records. @option options [Integer] :last Dump
last number of records.
@return [ReadFasta] Returns an instance of the class.
# File lib/BioDSL/commands/read_fasta.rb, line 93 def initialize(options) @options = options @count = 0 @buffer = [] check_options end
Public Instance Methods
Return a lambda for the read_fasta command.
@return [Proc] Returns the read_fasta command lambda.
# File lib/BioDSL/commands/read_fasta.rb, line 104 def lmb lambda do |input, output, status| status_init(status, STATS) read_input(input, output) options_glob(@options[:input]).each do |file| BioDSL::Fasta.open(file) do |ios| if @options[:first] && read_first(ios, output) elsif @options[:last] && read_last(ios) else read_all(ios, output) end end end write_buffer(output) if @options[:last] end end
Private Instance Methods
Check the options.
# File lib/BioDSL/commands/read_fasta.rb, line 127 def check_options options_allowed(@options, :input, :first, :last) options_required(@options, :input) options_files_exist(@options, :input) options_unique(@options, :first, :last) options_assert(@options, ':first >= 0') options_assert(@options, ':last >= 0') end
Read in all entries from input and emit to output.
@param input [BioDSL::Fasta] FASTA file input stream. @param output [Enumerable::Yielder] Output stream.
# File lib/BioDSL/commands/read_fasta.rb, line 199 def read_all(input, output) input.each do |entry| output << entry.to_bp @status[:records_out] += 1 @status[:sequences_out] += 1 @status[:residues_out] += entry.length end end
Read in a specified number of entries from the input and emit to the output.
@param input [BioDSL::Fasta] FASTA file input stream. @param output [Enumerable::Yielder] Output stream.
@return [Fixnum] Number of read entries.
# File lib/BioDSL/commands/read_fasta.rb, line 161 def read_first(input, output) first = @options[:first] input.each do |entry| break if @count == first output << entry.to_bp @status[:records_out] += 1 @status[:sequences_out] += 1 @status[:residues_out] += entry.length @count += 1 end @count end
Read and emit records from the input to the output stream.
@param input [Enumerable::Yielder] Input stream. @param output [Enumerable::Yielder] Output stream.
# File lib/BioDSL/commands/read_fasta.rb, line 140 def read_input(input, output) return unless input input.each do |record| output << record @status[:records_in] += 1 if record[:SEQ] @status[:sequences_in] += 1 @status[:residues_in] += record[:SEQ].length end end end
Read in entries from input and cache the specified last number in a buffer.
@param input [BioDSL::Fasta] FASTA file input stream.
@return [Fixnum] Number of read entries.
# File lib/BioDSL/commands/read_fasta.rb, line 184 def read_last(input) last = @options[:last] input.each do |entry| @buffer << entry @buffer.shift if @buffer.size > last end @buffer.size end
Emit all entries in buffer to output.
@param output [Enumerable::Yielder] Output stream.
# File lib/BioDSL/commands/read_fasta.rb, line 212 def write_buffer(output) @buffer.each do |entry| output << entry.to_bp @status[:records_out] += 1 @status[:sequences_out] += 1 @status[:residues_out] += entry.length end end