class GeneValidator::LengthRankValidation

This class contains the methods necessary for length validation by ranking the hit lengths

Constants

THRESHOLD

Public Class Methods

new(prediction, hits) click to toggle source

Initializes the object Params: prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (representing blast hits)

Calls superclass method
# File lib/genevalidator/validation_length_rank.rb, line 94
def initialize(prediction, hits)
  super
  @short_header = 'LengthRank'
  @header       = 'Length Rank'
  @description  = 'Check whether the rank of the prediction length lies' \
                  ' among 80% of all the BLAST hit lengths.'
  @cli_name     = 'lenr'
end

Public Instance Methods

run(hits = @hits, prediction = @prediction) click to toggle source

Calculates a percentage based on the rank of the prediction among the hit lengths Params: hits (optional): a vector of Sequence objects prediction (optional): a Sequence object Output: LengthRankValidationOutput object

# File lib/genevalidator/validation_length_rank.rb, line 111
def run(hits = @hits, prediction = @prediction)
  raise NotEnoughHitsError if hits.length < opt[:min_blast_hits]
  raise unless prediction.is_a?(Query) && hits[0].is_a?(Query)

  start = Time.now

  hits_lengths = hits.map { |x| x.length_protein.to_i }
                     .sort { |a, b| a <=> b }

  no_of_hits   = hits_lengths.length
  median       = hits_lengths.median.round
  query_length = prediction.length_protein
  mean         = hits_lengths.mean.round

  smallest_hit = hits_lengths[0]
  largest_hit  = hits_lengths[-1]

  if hits_lengths.standard_deviation <= 5
    msg = ''
    percentage = 100
  else
    if query_length < median
      extreme_hits = hits_lengths.find_all { |x| x < query_length }.length
      percentage   = ((extreme_hits.to_f / no_of_hits) * 100).round
      msg          = 'too&nbsp;short'
    else
      extreme_hits = hits_lengths.find_all { |x| x > query_length }.length
      percentage   = ((extreme_hits.to_f / no_of_hits) * 100).round
      msg          = 'too&nbsp;long'
    end
  end

  msg = '' if percentage >= THRESHOLD

  @validation_report = LengthRankValidationOutput.new(@short_header,
                                                      @header, @description,
                                                      msg, query_length,
                                                      no_of_hits, median,
                                                      mean, smallest_hit,
                                                      largest_hit,
                                                      extreme_hits,
                                                      percentage)
  @validation_report.run_time = Time.now - start
  @validation_report
rescue NotEnoughHitsError
  @validation_report = ValidationReport.new('Not enough evidence', :warning,
                                            @short_header, @header,
                                            @description)
rescue StandardError
  @validation_report = ValidationReport.new('Unexpected error', :error,
                                            @short_header, @header,
                                            @description)
  @validation_report.errors.push 'Unexpected Error'
end