class BioDSL::PlotHistogram

Plot a histogram of numerical values for a specified key.

plot_histogram create a histogram plot of the values for a specified key from all records in the stream. Plotting is done using GNUplot which allows for different types of output the default one being crufty ASCII graphics.

GNUplot’s facility for setting the xrange labels is used for numeric values, while for non-numeric values these are used for xrange labels.

GNUplot must be installed for plot_histogram to work. Read more here:

www.gnuplot.info/

Usage

plot_histogram(<key: <string>>[, value: <string>[, output: <file>
               [, force: <bool>[, terminal: <string>[, title: <string>
               [, xlabel: <string>[, ylabel: <string>
               [, ylogscale: <bool>[, test: <bool>]]]]]]]]])

Options

Examples

Here we plot a histogram of sequence lengths from a FASTA file:

read_fasta(input: "test.fna").plot_histogram(key: :SEQ_LEN).run

                                  Histogram
       +             +            +            +            +             +
  90 +++-------------+------------+------------+------------+-------------+++
      |                                                                    |
  80 ++                                                                  **++
      |                                                                  **|
  70 ++                                                                  **++
  60 ++                                                                  **++
      |                                                                  **|
  50 ++                                                                  **++
      |                                                                  **|
  40 ++                                                                  **++
      |                                                                  **|
  30 ++                                                                  **++
  20 ++                                                                  **++
      |                                                                  **|
  10 ++                                                                  **++
      |                                                              ******|
   0 +++-------------+------------+**--------**+--***-------+**--**********++
       +             +            +            +            +             +
       0             10           20           30           40            50
                                     SEQ_LEN

To render X11 output (i.e. instant view) use the terminal option:

read_fasta(input: "test.fna").
plot_histogram(key: :SEQ_LEN, terminal: :x11).run

To generate a PNG image and save to file:

read_fasta(input: "test.fna").
plot_histogram(key: :SEQ_LEN, terminal: :png, output: "plot.png").run

rubocop:disable ClassLength rubocop:enable LineLength

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for PlotHistogram.

@param options [Hash] Options hash. @option options [String,:Symbol] :key @option options [String,:Symbol] :value @option options [String] :output @option options [Booleon] :force @option options [String,:Symbol] :terminal @option options [String] :title @option options [String] :xlabel @option options [String] :ylabel @option options [Booleon] :ylogscale @option options [Booleon] :test

@return [PlotHistogram] class instance.

# File lib/BioDSL/commands/plot_histogram.rb, line 128
def initialize(options)
  @options     = options
  @key         = options[:key]
  @value       = options[:value]
  @count_hash  = Hash.new(0)
  @gp          = nil

  aux_exist('gnuplot')
  check_options
  defaults
end

Public Instance Methods

lmb() click to toggle source

Return the command lambda for plot_histogram

@return [Proc] command lambda.

# File lib/BioDSL/commands/plot_histogram.rb, line 143
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    process_input(input, output)
    plot_create
    plot_output
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/plot_histogram.rb, line 156
def check_options
  options_allowed(@options, :key, :value, :output, :force, :terminal,
                  :title, :xlabel, :ylabel, :ylogscale, :test)
  options_allowed_values(@options, terminal: [:dumb, :post, :svg, :x11,
                                              :aqua, :png, :pdf])
  options_allowed_values(@options, force: [nil, true, false])
  options_allowed_values(@options, test: [nil, true, false])
  options_required(@options, :key)
  options_files_exist_force(@options, :output)
end
defaults() click to toggle source

Set default values for options hash.

# File lib/BioDSL/commands/plot_histogram.rb, line 168
def defaults
  @options[:terminal] ||= :dumb
  @options[:title] ||= 'Histogram'
  @options[:xlabel] ||= @options[:key]
  @options[:ylabel] ||= 'n'

  @options[:ylogscale] &&
    @options[:ylabel] = "log10(#{@options[:ylabel]})"
end
plot_create() click to toggle source

Create a Gnuplot using the collected data from the input stream.

# File lib/BioDSL/commands/plot_histogram.rb, line 213
def plot_create
  @gp = GnuPlotter.new
  plot_defaults
  plot_fix_ylogscale

  if @count_hash.empty?
    plot_empty
  elsif @count_hash.keys.first.is_a? Numeric
    plot_numeric
  else
    plot_string
  end

  plot_fix_xtics
end
plot_defaults() click to toggle source

Set the default values for the plot.

# File lib/BioDSL/commands/plot_histogram.rb, line 230
def plot_defaults
  @gp.set terminal:  @options[:terminal].to_s
  @gp.set title:     @options[:title]
  @gp.set xlabel:    @options[:xlabel]
  @gp.set ylabel:    @options[:ylabel]
  @gp.set autoscale: 'xfix'
  @gp.set style:     'fill solid 0.5 border'
  @gp.set xtics:     'out'
  @gp.set ytics:     'out'
end
plot_empty() click to toggle source

Set plot values to create an empty plot if no plot data was collected.

# File lib/BioDSL/commands/plot_histogram.rb, line 252
def plot_empty
  @gp.set yrange: '[-1:1]'
  @gp.set key:    'off'
  @gp.unset xtics: true
  @gp.unset ytics: true
end
plot_fix_xtics() click to toggle source

Determine if xtics should be plottet and unset these if not. Don’t plot xtics if more than 50 strings.

# File lib/BioDSL/commands/plot_histogram.rb, line 287
def plot_fix_xtics
  return unless @count_hash.keys.first.class == String &&
                @count_hash.size > 50
  @gp.unset xtics: true
end
plot_fix_ylogscale() click to toggle source

Set plot values accodingly if the ylogscale flag is set.

# File lib/BioDSL/commands/plot_histogram.rb, line 242
def plot_fix_ylogscale
  if @options[:ylogscale]
    @gp.set logscale: 'y'
    @gp.set yrange: '[1:*]'
  else
    @gp.set yrange: '[0:*]'
  end
end
plot_numeric() click to toggle source

If plot data have numeric xtic values use numeric xtic labels.

# File lib/BioDSL/commands/plot_histogram.rb, line 260
def plot_numeric
  x_max = @count_hash.keys.max || 0

  @gp.add_dataset(using: '1:2', with: 'boxes notitle') do |plotter|
    (0..x_max).each { |x| plotter << [x, @count_hash[x]] }
  end
end
plot_output() click to toggle source

Output plot data

# File lib/BioDSL/commands/plot_histogram.rb, line 294
def plot_output
  @gp.set output: @options[:output] if @options[:output]

  if @options[:test]
    $stderr.puts @gp.to_gp
  elsif @options[:terminal] == :dumb
    puts @gp.plot
  else
    @gp.plot
  end
end
plot_string() click to toggle source

If plot data gave string xtic values use these as xtic labels.

# File lib/BioDSL/commands/plot_histogram.rb, line 269
def plot_string
  plot_xtics_rotate

  @gp.add_dataset(using: '2:xticlabels(1)',
                  with: 'boxes notitle lc rgb "red"') do |plotter|
    @count_hash.each { |k, v| plotter << [k, v] }
  end
end
plot_xtics_rotate() click to toggle source

If xtic labels are longer then 2, rotate these.

# File lib/BioDSL/commands/plot_histogram.rb, line 279
def plot_xtics_rotate
  return unless @count_hash.first.first.size > 2
  @gp.set xtics: 'rotate'
  @gp.set xlabel: ''
end
process_input(input, output) click to toggle source

Process the input stream, collect all plot data, and output records.

@param input [Enumerator] Input stream. @param output [Enumerator::Yielder] Output stream.

# File lib/BioDSL/commands/plot_histogram.rb, line 182
def process_input(input, output)
  input.each do |record|
    @status[:records_in] += 1

    if (k = record[@key])
      if @value
        if (v = record[@value])
          @count_hash[k] += v
        else
          fail "value: #{@value} not found in record: #{record}"
        end
      else
        @count_hash[k] += 1
      end
    end

    process_output(output, record)
  end
end
process_output(output, record) click to toggle source

Output record to the output stream if such is defined.

@param output [Enumerator::Yielder] Output stream. @param record [Hash] BioDSL record.

# File lib/BioDSL/commands/plot_histogram.rb, line 206
def process_output(output, record)
  return unless output
  output << record
  @status[:records_out] += 1
end