class BioDSL::SplitValues

Split the values of a key into new key/value pairs.

split_values splits the value of a given key into multiple values that are added to the record. The keys used for the values are per default based on the given key with an added index, but using the keys option allows specifying a list of keys to use instead.

Usage

split_values(<key>: <string>>[, delimiter: <string>[, keys: <list>]])

Options

# key: <string> - Key who’s value to split.

Examples

Consider the following records:

{ID: "FOO:count=10", SEQ: "gataag"}
{ID: "FOO_10_20", SEQ: "gataag"}

To split the value belinging to ID do:

split_values(key: :ID)

{:ID=>"FOO:count=10", :SEQ=>"gataag"}
{:ID=>"FOO_10_20", :SEQ=>"gataag", :ID_0=>"FOO", :ID_1=>10, :ID_2=>20}

Using a different delimiter:

split_values(key: "ID", delimiter: ':count=')

{:ID=>"FOO:count=10", :SEQ=>"gataag", :ID_0=>"FOO", :ID_1=>10}
{:ID=>"FOO_10_20", :SEQ=>"gataag"}

Using a different delimiter and a list of keys:

split_values(key: "ID", keys: ["ID", :COUNT], delimiter: ':count=')

{:ID=>"FOO", :SEQ=>"gataag", :COUNT=>10}
{:ID=>"FOO_10_20", :SEQ=>"gataag"}

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for SplitValues.

@param options [Hash] Options hash. @option options [String,Symbol] :key @option options [Array] :keys @option options [String] :delimiter

@return [SplitValues] Class instance.

# File lib/BioDSL/commands/split_values.rb, line 84
def initialize(options)
  @options = options

  check_options

  @first       = true
  @convert     = []
  @keys        = @options[:keys]
  @key         = @options[:key].to_sym
  @delimiter   = @options[:delimiter] || '_'
end

Public Instance Methods

lmb() click to toggle source

Return command lambda for split_values.

@return [Proc] Command lambda.

# File lib/BioDSL/commands/split_values.rb, line 99
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      if (value = record[@key])
        values = value.split(@delimiter)

        if values.size > 1
          determine_types(values) if @first

          split_values(values, record)
        end
      end

      output << record

      @status[:records_out] += 1
    end
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/split_values.rb, line 126
def check_options
  options_allowed(@options, :key, :keys, :delimiter)
  options_required(@options, :key)
end
determine_types(values) click to toggle source

Given an array of values determine the types that must be converted to integers or floats and save the value index in a class variable.

@param values [Array] List of values.

# File lib/BioDSL/commands/split_values.rb, line 135
def determine_types(values)
  values.each_with_index do |val, i|
    val = val.to_num

    if val.is_a? Fixnum
      @convert[i] = :to_i
    elsif val.is_a? Float
      @convert[i] = :to_f
    end
  end

  @first = false
end
split_values(values, record) click to toggle source

Convert values and add to record.

@param values [Array] List of values. @param record [Hash] BioDSL record.

# File lib/BioDSL/commands/split_values.rb, line 153
def split_values(values, record)
  values.each_with_index do |val, i|
    val = val.send(@convert[i]) if @convert[i]

    if @keys
      record[@keys[i].to_sym] = val
    else
      record["#{@key}_#{i}".to_sym] = val
    end
  end
end