class BioDSL::UniqueValues
Select unique or non-unique records based on the value of a given key.¶ ↑
_unique_values+ selects records from the stream by checking values of a given key. If a duplicate record exists based on the given key, it will only output one record (the first). If the invert
option is used, then non-unique records are selected.
Usage¶ ↑
unique_values(<key: <string>[, invert: <bool>])
Options¶ ↑
-
key: <string> - Key for which the value is checked for uniqueness.
-
invert: <bool> - Select non-unique records (default=false).
Examples¶ ↑
Consider the following two column table in the file ‘test.tab`:
Human H1 Human H2 Human H3 Dog D1 Dog D2 Mouse M1
To output only unique values for the first column we first read the table with read_table
and then pass the result to unique_values
:
BD.new.read_table(input: "test.tab").unique_values(key: :V0).dump.run {:V0=>"Human", :V1=>"H1"} {:V0=>"Dog", :V1=>"D1"} {:V0=>"Mouse", :V1=>"M1"}
To output duplicate records instead use the invert
options:
BD.new. read_table(input: "test.tab"). unique_values(key: :V0, invert: true). dump. run {:V0=>"Human", :V1=>"H2"} {:V0=>"Human", :V1=>"H3"} {:V0=>"Dog", :V1=>"D2"}
Constants
- STATS
Public Class Methods
Constructor for UniqueValues
.
@param options [Hash] Options hash. @option options [String,Symbol] :key @option options [Boolean] :invert
@return [UniqueValues] Class instance.
# File lib/BioDSL/commands/unique_values.rb, line 88 def initialize(options) @options = options @lookup = Set.new @key = options[:key].to_sym @invert = options[:invert] check_options end
Public Instance Methods
Return command lambda for unique_values
@return [Proc] Command
lambda.
# File lib/BioDSL/commands/unique_values.rb, line 100 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 if output_record?(record) output << record @status[:records_out] += 1 end end end end
Private Instance Methods
Check options.
# File lib/BioDSL/commands/unique_values.rb, line 118 def check_options options_allowed(@options, :key, :invert) options_required(@options, :key) options_allowed_values(@options, invert: [true, false, nil]) end
Determine if a record should be output or not. If the wanted key is not present in the record it will be output. If the value is unique the record will be output, unless the invert
option was used which will result in non-unique records to be output.
@param record [Hash] BioDSL
record.
@return [Boolean]
# File lib/BioDSL/commands/unique_values.rb, line 134 def output_record?(record) return true unless (value = record[@key]) value = value.to_sym if value.is_a? String found = @lookup.include?(value) @lookup.add(value) unless found found && @invert || !found && !@invert end