class BioDSL::UniqueValues

Select unique or non-unique records based on the value of a given key.

_unique_values+ selects records from the stream by checking values of a given key. If a duplicate record exists based on the given key, it will only output one record (the first). If the invert option is used, then non-unique records are selected.

Usage

unique_values(<key: <string>[, invert: <bool>])

Options

Examples

Consider the following two column table in the file ‘test.tab`:

Human   H1
Human   H2
Human   H3
Dog     D1
Dog     D2
Mouse   M1

To output only unique values for the first column we first read the table with read_table and then pass the result to unique_values:

BD.new.read_table(input: "test.tab").unique_values(key: :V0).dump.run

{:V0=>"Human", :V1=>"H1"}
{:V0=>"Dog", :V1=>"D1"}
{:V0=>"Mouse", :V1=>"M1"}

To output duplicate records instead use the invert options:

BD.new.
read_table(input: "test.tab").
unique_values(key: :V0, invert: true).
dump.
run

{:V0=>"Human", :V1=>"H2"}
{:V0=>"Human", :V1=>"H3"}
{:V0=>"Dog", :V1=>"D2"}

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for UniqueValues.

@param options [Hash] Options hash. @option options [String,Symbol] :key @option options [Boolean] :invert

@return [UniqueValues] Class instance.

# File lib/BioDSL/commands/unique_values.rb, line 88
def initialize(options)
  @options     = options
  @lookup      = Set.new
  @key         = options[:key].to_sym
  @invert      = options[:invert]

  check_options
end

Public Instance Methods

lmb() click to toggle source

Return command lambda for unique_values

@return [Proc] Command lambda.

# File lib/BioDSL/commands/unique_values.rb, line 100
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      if output_record?(record)
        output << record
        @status[:records_out] += 1
      end
    end
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/unique_values.rb, line 118
def check_options
  options_allowed(@options, :key, :invert)
  options_required(@options, :key)
  options_allowed_values(@options, invert: [true, false, nil])
end
output_record?(record) click to toggle source

Determine if a record should be output or not. If the wanted key is not present in the record it will be output. If the value is unique the record will be output, unless the invert option was used which will result in non-unique records to be output.

@param record [Hash] BioDSL record.

@return [Boolean]

# File lib/BioDSL/commands/unique_values.rb, line 134
def output_record?(record)
  return true unless (value = record[@key])

  value = value.to_sym if value.is_a? String
  found = @lookup.include?(value)

  @lookup.add(value) unless found

  found && @invert || !found && !@invert
end