class SequenceServer::Doctor

Doctor detects inconsistencies likely to cause problems with Sequenceserver operation.

Constants

AVOID_ID_REGEX
ERROR_NUMERIC_IDS
ERROR_PARSE_SEQIDS
ERROR_PROBLEMATIC_IDS

Attributes

all_seqids[R]
invalids[R]

Public Class Methods

all_sequence_ids(ignore) click to toggle source

Retrieve sequence ids (specified by %i) from all databases. Using accession number is problematic because of several reasons.

# File lib/sequenceserver/doctor.rb, line 31
def all_sequence_ids(ignore)
  Database.map do |db|
    next if ignore.include? db

    out = `blastdbcmd -entry all -db #{db.name} -outfmt "%i" 2> /dev/null`
    {
      db:     db,
      seqids: out.to_s.split
    }
  end.compact
end
bullet_list(values) click to toggle source

Pretty print database list.

# File lib/sequenceserver/doctor.rb, line 54
def bullet_list(values)
  list = ''
  values.each do |value|
    list << "      - #{value}\n"
  end
  list
end
inspect_parse_seqids(seqids) click to toggle source

FASTA files formatted without -parse_seqids option won’t support the blastdbcmd command of fetching sequence ids using ‘%i’ identifier. In such cases, an array of ‘N/A’ values are returned which is checked in this case.

# File lib/sequenceserver/doctor.rb, line 47
def inspect_parse_seqids(seqids)
  seqids.map do |sq|
    sq[:db] if sq[:seqids].include? 'N/A'
  end.compact
end
inspect_seqids(seqids, &block) click to toggle source

Returns an array of database objects in which each of the object has an array of sequence_ids satisfying the block passed to the method.

# File lib/sequenceserver/doctor.rb, line 23
def inspect_seqids(seqids, &block)
  seqids.map do |sq|
    sq[:db] unless sq[:seqids].select(&block).empty?
  end.compact
end
new() click to toggle source
# File lib/sequenceserver/doctor.rb, line 98
def initialize
  @ignore     = []
  @all_seqids = Doctor.all_sequence_ids(@ignore)
end
show_message(error, values) click to toggle source

Print diagnostic error messages according to the type of error. rubocop:disable Metrics/MethodLength

# File lib/sequenceserver/doctor.rb, line 64
      def show_message(error, values)
        return if values.empty?

        case error
        when ERROR_PARSE_SEQIDS
          puts <<~MSG
            *** Doctor has found improperly formatted database:
            #{bullet_list(values)}
            Please reformat your databases with -parse_seqids switch (or use
            sequenceserver -m) for using SequenceServer as the current format
            may cause problems.

            These databases are ignored in further checks.
          MSG

        when ERROR_NUMERIC_IDS
          puts <<~MSG
            *** Doctor has found databases with numeric sequence ids:
            #{bullet_list(values)}
            Note that this may cause problems with sequence retrieval.
          MSG

        when ERROR_PROBLEMATIC_IDS
          puts <<~MSG
            *** Doctor has found databases with problematic sequence ids:
            #{bullet_list(values)}
            This causes some sequence to contain extraneous words like `gnl|`
            appended to their id string.
          MSG
        end
      end

Public Instance Methods

check_id_format() click to toggle source

Warn users about sequence identifiers of format abc|def because then BLAST+ appends a gnl (for general) infront of the database identifiers. There are only two identifiers that we need to avoid when searching for this format. bbs|number, gi|number Note that while sequence ids could have been arbitrary, using parse_seqids reduces our search space substantially.

# File lib/sequenceserver/doctor.rb, line 147
def check_id_format
  selector = proc { |id| id.match(AVOID_ID_REGEX) }

  Doctor.show_message(ERROR_PROBLEMATIC_IDS,
                      Doctor.inspect_seqids(@all_seqids, &selector))
end
check_numeric_ids() click to toggle source

Check for the presence of numeric sequence ids within a database.

# File lib/sequenceserver/doctor.rb, line 133
def check_numeric_ids
  selector = proc { |id| !id.to_i.zero? }

  Doctor.show_message(ERROR_NUMERIC_IDS,
                      Doctor.inspect_seqids(@all_seqids, &selector))
end
check_parse_seqids() click to toggle source

Obtain files which aren’t formatted with -parse_seqids and add them to ignore list.

# File lib/sequenceserver/doctor.rb, line 125
def check_parse_seqids
  without_parse_seqids = Doctor.inspect_parse_seqids(@all_seqids)
  Doctor.show_message(ERROR_PARSE_SEQIDS, without_parse_seqids)

  @ignore.concat(without_parse_seqids)
end
diagnose() click to toggle source
# File lib/sequenceserver/doctor.rb, line 105
def diagnose
  puts "\n1/3 Inspecting databases for proper -parse_seqids formatting.."
  check_parse_seqids
  remove_invalid_databases

  puts "\n2/3 Inspecting databases for numeric sequence ids.."
  check_numeric_ids

  puts "\n3/3 Inspecting databases for problematic sequence ids.."
  check_id_format
end
remove_invalid_databases() click to toggle source

Remove entried which are in ignore list or not formatted with -parse_seqids option.

# File lib/sequenceserver/doctor.rb, line 119
def remove_invalid_databases
  @all_seqids.delete_if { |sq| @ignore.include? sq[:db] }
end