class SequenceServer::Sequence

Utility methods.

Public Class Methods

composition(sequence_string) click to toggle source

Copied from BioRuby's `Bio::Sequence` class.

> composition(“asdfasdfffffasdf”)

> {“a”=>3, “d”=>3, “f”=>7, “s”=>3}

# File lib/sequenceserver/sequence.rb, line 100
def composition(sequence_string)
  count = Hash.new(0)
  sequence_string.scan(/./) do |x|
    count[x] += 1
  end
  count
end
guess_type(sequence) click to toggle source

Strips all non-letter characters. If less than 10 useable characters return `nil`. If at least 90% is ACGTU, returns `:nucleotide`, else `:protein`.

# File lib/sequenceserver/sequence.rb, line 79
def guess_type(sequence)
  # Clean the sequence: first remove non-letter characters, then
  # ambiguous characters.
  cleaned_sequence = sequence.gsub(/[^A-Z]/i, '').gsub(/[NX]/i, '')

  return if cleaned_sequence.length < 10 # conservative

  # Count putative NA in the sequence.
  na_count = 0
  composition = composition(cleaned_sequence)
  composition.each do |character, count|
    na_count += count if character.match(/[ACGTU]/i)
  end

  na_count > (0.9 * cleaned_sequence.length) ? :nucleotide : :protein
end