module SequenceServer

Top level module / namespace.

Define constants used by SequenceServer module

www.ncbi.nlm.nih.gov/books/NBK1763/ (Appendices)

Define Config class.

Define Database class.

This file defines all possible exceptions that can be thrown by SequenceServer on startup.

Exceptions only ever inform another entity (downstream code or users) of an issue. Exceptions may or may not be recoverable.

Error classes should be seen as: the error code (class name), human readable message (to_s method), and necessary attributes to act on the error.

We define as many error classes as needed to be precise about the issue, thus making it easy for downstream code (bin/sequenceserver or config.ru) to act on them.

Define Sequence class.

Define version number.

Constants

APIError

API errors have an http status, title, message, and additional information like stacktrace or information from program output.

BLAST_VERSION

The default version of BLAST that will be downloaded and configured for use.

DEFAULT_CONFIG_FILE

Default location of configuration file.

DOTDIR

Constant for denoting the path ~/.sequenceserver

Database

Captures a directory containing FASTA files and BLAST databases.

Formatting a FASTA for use with BLAST+ will create 3 or 6 files, collectively referred to as a BLAST database.

It is important that formatted BLAST database files have the same dirname and basename as the source FASTA for SequenceServer to be able to tell formatted FASTA from unformatted. And that FASTA files be formatted with ‘parse_seqids` option of `makeblastdb` for sequence retrieval to work.

SequenceServer will always place BLAST database files alongside input FASTA, and use ‘parse_seqids` option of `makeblastdb` to format databases.

Error
Sequence

Provides simple sequence processing utilities via class methods. Instance of the class serves as a simple data object to captures sequences fetched from BLAST databases.

NOTE:

What all do we need to consistently construct FASTA from `blastdbcmd's`
output?

It would seem rather straightforward. But it's not.

FASTA format:

  >id title
  actual sequence

ID of a sequence fetched from nr database should look like this:

  gi|322796550|gb|EFZ19024.1| -> self.id
                  accession   -> self.accession
                  ----------
                sequence id   -> self.seqid
               -------------
     ---------
     gi number                -> self.gi

while for local databases, the id should be the exact same,
as in the original FASTA file:

  SI2.2.0_06267 -> self.id == self.accession
VERSION

Attributes

config[R]

Holds SequenceServer configuration object for this process. This is available only after calling SequenceServer.init.

Public Class Methods

call(env) click to toggle source

Rack-interface.

Add our logger to Rack env and let Routes do the rest.

# File lib/sequenceserver.rb, line 115
def call(env)
  env['rack.logger'] = logger
  Routes.call(env)
end
development?() click to toggle source

Returns true if RACK_ENV is set to ‘development’. Raw JS and CSS files are served in development mode and the logger is made more verbose.

# File lib/sequenceserver.rb, line 43
def development?
  environment == 'development'
end
Also aliased as: verbose?
environment() click to toggle source

Returns ENV. This environment variable determines if we are in development on in production mode (default).

# File lib/sequenceserver.rb, line 37
def environment
  ENV['RACK_ENV']
end
init(config = {}) click to toggle source

SequenceServer initialisation routine.

# File lib/sequenceserver.rb, line 67
def init(config = {})
  # Reset makeblastdb cache, because configuration may have changed.
  @makeblastdb = nil

  # Use default config file if caller didn't specify one.
  config[:config_file] ||= DEFAULT_CONFIG_FILE

  # Initialise global configuration object from the above config hash.
  @config = Config.new(config)

  # When in development mode, cause SequenceServer to terminate if any
  # thread spawned by the main process raises an unhandled exception. In
  # production mode the expectation is to log at appropriate severity level
  # and continue operating.
  Thread.abort_on_exception = true if development?

  # Now locate binaries, scan databases directory, require any plugin files.
  load_extension
  init_binaries
  init_database

  # The above methods validate bin dir, database dir, and path to plugin
  # files. Port and host settings don't need to be validated: if running
  # in self-hosted mode, WEBrick will handle incorrect values and if
  # running via Apache+Passenger host and port settings are not used.
  # Let's validate remaining configuration keys next.

  # Validate number of threads to use with BLAST.
  check_num_threads

  # Doesn't make sense to activate JobRemover when testing. It anyway
  # keeps stumbling on the mock test jobs that miss a few keys.
  unless environment == 'test'
    @job_remover = JobRemover.new(@config[:job_lifetime])
  end

  # 'self' is the most meaningful object that can be returned by this
  # method.
  self
end
irb() click to toggle source

This method is invoked by the -i switch to start an IRB shell with SequenceServer loaded.

# File lib/sequenceserver.rb, line 173
def irb
  ARGV.clear
  require 'irb'
  IRB.setup nil
  IRB.conf[:MAIN_CONTEXT] = IRB::Irb.new.context
  require 'irb/ext/multi-irb'
  IRB.irb nil, self
end
logger() click to toggle source

Logger object used in the initialisation routine and throughout the application.

# File lib/sequenceserver.rb, line 50
def logger
  @logger ||= case environment
              when 'development'
                Logger.new(STDERR, Logger::DEBUG)
              when 'test'
                Logger.new(STDERR, Logger::WARN)
              else
                Logger.new(STDERR, Logger::INFO)
              end
end
makeblastdb() click to toggle source

MAKEBLASTDB service object.

# File lib/sequenceserver.rb, line 62
def makeblastdb
  @makeblastdb ||= MAKEBLASTDB.new(config[:database_dir])
end
on_start() click to toggle source

This method is called after WEBrick has bound to the host and port and is ready to accept connections.

# File lib/sequenceserver.rb, line 138
def on_start
  puts '** SequenceServer is ready.'
  puts "   Go to #{server_url} in your browser and start BLASTing!"
  if ip_address
    puts '   To share your setup, try one of the following addresses. These'
    puts '   may only work within your home, office, or university network.'
    puts "     -  http://#{ip_address}:#{config[:port]}"
    puts "     -  http://#{hostname}:#{config[:port]}" if hostname
    puts '   To share your setup with anyone in the world, ask your IT team'
    puts '   for a public IP address or consider the SequenceServer cloud'
    puts '   hosting service: https://sequenceserver.com/cloud'
    puts '   To disable sharing, set :host: key in config file to 127.0.0.1'
    puts '   and restart server.'
  end
  puts '   To terminate server, press CTRL+C'
  open_in_browser(server_url)
end
on_stop() click to toggle source

This method is called when WEBrick is terminated.

# File lib/sequenceserver.rb, line 157
def on_stop
  puts
  puts '** Thank you for using SequenceServer :).'
  puts '   Please cite: '
  puts '       Priyam A, Woodcroft BJ, Rai V, Moghul I, Munagala A, Ter F,'
  puts '       Chowdhary H, Pieniak I, Maynard LJ, Gibbins MA, Moon H,'
  puts '       Davis-Richardson A, Uludag M, Watson-Haigh N, Challis R,'
  puts '       Nakamura H, Favreau E, Gómez EA, Pluskal T, Leonard G,'
  puts '       Rumpf W & Wurm Y.'
  puts '       Sequenceserver: A modern graphical user interface for'
  puts '       custom BLAST databases.'
  puts '       Molecular Biology and Evolution (2019)'
end
run() click to toggle source

Run SequenceServer using WEBrick.

# File lib/sequenceserver.rb, line 121
def run
  Server.run(self)
rescue Errno::EADDRINUSE
  puts "** Could not bind to port #{config[:port]}."
  puts "   Is SequenceServer already accessible at #{server_url}?"
  puts '   No? Try running SequenceServer on another port, like so:'
  puts
  puts '       sequenceserver -p 4570.'
rescue Errno::EACCES
  puts "** Need root privilege to bind to port #{config[:port]}."
  puts '   It is not advisable to run SequenceServer as root.'
  puts '   Please use Apache/Nginx to bind to a privileged port.'
  puts '   Instructions available on http://sequenceserver.com.'
end
sys(command, options = {}) click to toggle source

‘sys’ executes a shell command.

‘sys’ can write the stdout and/or stderr from a shell command to files, or

return these values.

‘sys’ can get from a failed shell command stdout, stderr, and exit status.

Supply ‘sys’ with the shell command and optionally: dir: A directory to change to for the duration of the execution of the shell command. path: A directory to change the PATH environment variable to for the duration of the execution of the shell command. stdout: A path to a file to store stdout. stderr: A path to a file to store stderr.

Usage:

sys(command, dir: ‘/path/to/directory’, path: ‘/path/to/directory’,

stdout: '/path/to/stdout_file', stderr: '/path/to/stderr_file')

rubocop:disable Metrics/CyclomaticComplexity

# File lib/sequenceserver/sys.rb, line 25
def self.sys(command, options = {})
  # Available output channels
  channels = %i[stdout stderr]

  # Make temporary files to store output from stdout and stderr.
  temp_files = {
    stdout: Tempfile.new('sequenceserver-sys'),
    stderr: Tempfile.new('sequenceserver-sys')
  }

  # Log the command we are going to run - use -D option to view.
  logger.debug("Executing: #{command}")

  # Run command in a child process. This allows us to control PATH
  # and pwd of the running process.
  child_pid = fork do
    # Set the PATH environment variable to the binary directory or
    # safe directory.
    ENV['PATH'] = options[:path] if options[:path]

    # Change to the specified directory.
    Dir.chdir(options[:dir]) if options[:dir] && Dir.exist?(options[:dir])

    # Execute the shell command, redirect stdout and stderr to the
    # temporary files.
    exec(command, out: temp_files[:stdout].path.to_s, \
                  err: temp_files[:stderr].path.to_s)
  end

  # Wait for the termination of the child process.
  _, status = Process.wait2(child_pid)

  # If a full path was given for stdout and stderr files, move the
  # temporary files to this path. If the path given does not exist,
  # create it.
  channels.each do |channel|
    filename = options[channel]
    break unless filename

    # If the given path has a directory component, ensure it exists.
    file_dir = File.dirname(filename)
    FileUtils.mkdir_p(file_dir) unless File.directory?(file_dir)

    # Now move the temporary file to the given path.
    # TODO: don't we need to explicitly close the temp file here?
    FileUtils.cp(temp_files[channel], filename)
  end

  # Read the remaining temp files into memory. For large outputs,
  # the caller should supply a file path to prevent loading the
  # output in memory.
  temp_files.each do |channel, tempfile|
    temp_files[channel] = tempfile.read
  end

  # Finally, return contents of the remaining temp files if the
  # command completed successfully or raise CommandFailed error.
  return temp_files.values if status.success?
  raise CommandFailed.new(status.exitstatus, **temp_files)
end
verbose?()
Alias for: development?

Private Class Methods

assert_blast_installed_and_compatible() click to toggle source
# File lib/sequenceserver.rb, line 254
def assert_blast_installed_and_compatible
  begin
    out, = sys('blastdbcmd -version', path: config[:bin])
  rescue CommandFailed
    fail BLAST_NOT_INSTALLED_OR_NOT_EXECUTABLE
  end
  version = out.split[1]
  fail BLAST_NOT_INSTALLED_OR_NOT_EXECUTABLE if version.empty?
  fail BLAST_NOT_COMPATIBLE, version unless is_compatible(version, BLAST_VERSION)
end
check_database_compatibility() click to toggle source
# File lib/sequenceserver.rb, line 216
def check_database_compatibility
  Database.each do |database|
    logger.debug "Found #{database.type} database '#{database.title}' at '#{database.path}'"
    if database.non_parse_seqids?
      logger.warn "Database '#{database.title}' was created without using the" \
                  ' -parse_seqids option of makeblastdb. FASTA download will' \
                  " not work correctly (path: '#{database.path}')."
    elsif database.v4?
      logger.warn "Database '#{database.title}' is of older format. Mixing" \
                  ' old and new format databases can be problematic' \
                  "(path: '#{database.path}')."
    end
  end
end
check_num_threads() click to toggle source
# File lib/sequenceserver.rb, line 231
def check_num_threads
  num_threads = Integer(config[:num_threads])
  fail NUM_THREADS_INCORRECT unless num_threads.positive?
  logger.debug "Will use #{num_threads} threads to run BLAST."
  if num_threads > 256
    logger.warn "Number of threads set at #{num_threads} is unusually high."
  end
rescue ArgumentError
  raise NUM_THREADS_INCORRECT
end
command?(command) click to toggle source

Return ‘true` if the given command exists and is executable.

# File lib/sequenceserver.rb, line 311
def command?(command)
  system("which #{command} > /dev/null 2>&1")
end
hostname() click to toggle source

Returns machine’s hostname based on the local ip. If hostname cannot be determined returns nil.

# File lib/sequenceserver.rb, line 279
def hostname
  Resolv.getname(ip_address) rescue nil
end
init_binaries() click to toggle source
# File lib/sequenceserver.rb, line 184
def init_binaries
  if config[:bin]
    config[:bin] = File.expand_path config[:bin]
    unless File.exist?(config[:bin]) && File.directory?(config[:bin])
      fail ENOENT.new('bin dir', config[:bin])
    end
    logger.debug("Will use NCBI BLAST+ at: #{config[:bin]}")
  else
    logger.debug('Location of NCBI BLAST+ not provided. Assuming NCBI' \
                 ' BLAST+ to be present in: $PATH')
  end

  assert_blast_installed_and_compatible
end
init_database() click to toggle source
# File lib/sequenceserver.rb, line 199
def init_database
  fail DATABASE_DIR_NOT_SET unless config[:database_dir]

  config[:database_dir] = File.expand_path(config[:database_dir])
  unless File.exist?(config[:database_dir]) &&
         File.directory?(config[:database_dir])
    fail ENOENT.new('database dir', config[:database_dir])
  end

  logger.debug("Will look for BLAST+ databases in: #{config[:database_dir]}")

  fail NO_BLAST_DATABASE_FOUND, config[:database_dir] unless makeblastdb.any_formatted?

  Database.collection = makeblastdb.formatted_fastas
  check_database_compatibility unless config[:optimistic].to_s == 'true'
end
ip_address() click to toggle source

Returns a local ip adress.

# File lib/sequenceserver.rb, line 272
def ip_address
  addrinfo = Socket.ip_address_list.find { |ai| ai.ipv4? && !ai.ipv4_loopback? }
  addrinfo.ip_address if addrinfo
end
is_compatible(given, expected) click to toggle source

Returns true if the given version is higher than the minimum expected version string.

# File lib/sequenceserver.rb, line 317
def is_compatible(given, expected)
  # The speceship operator (<=>) below returns -1, 0, 1 depending on
  # on whether the left operand is lower, same, or higher than the
  # right operand. We want the left operand to be the same or higher.
  (parse_version(given) <=> parse_version(expected)) >= 0
end
load_extension() click to toggle source
# File lib/sequenceserver.rb, line 242
def load_extension
  return unless config[:require]

  config[:require] = File.expand_path config[:require]
  unless File.exist?(config[:require]) && File.file?(config[:require])
    fail ENOENT.new('extension file', config[:require])
  end

  logger.debug("Loading extension: #{config[:require]}")
  require config[:require]
end
open_in_browser(server_url) click to toggle source

Uses ‘open` on Mac or `xdg-open` on Linux to opens the search form in user’s default browser. This function is called when SequenceServer is launched from the terminal. Errors, if any, are silenced.

rubocop:disable Metrics/CyclomaticComplexity, Style/RescueStandardError, Lint/HandleExceptions

# File lib/sequenceserver.rb, line 289
def open_in_browser(server_url)
  return if using_ssh? || verbose?
  if RUBY_PLATFORM =~ /linux/ && xdg?
    sys("xdg-open #{server_url}")
  elsif RUBY_PLATFORM =~ /darwin/
    sys("open #{server_url}")
  end
rescue
  # fail silently
end
parse_version(version_string) click to toggle source

Turn version string into an arrary of its component numbers.

# File lib/sequenceserver.rb, line 325
def parse_version(version_string)
  version_string.split('.').map(&:to_i)
end
server_url() click to toggle source
# File lib/sequenceserver.rb, line 265
def server_url
  host = config[:host]
  host = 'localhost' if ['127.0.0.1', '0.0.0.0'].include?(host)
  "http://#{host}:#{config[:port]}"
end
using_ssh?() click to toggle source

rubocop:enable Metrics/CyclomaticComplexity, Style/RescueStandardError, Lint/HandleExceptions

# File lib/sequenceserver.rb, line 302
def using_ssh?
  true if ENV['SSH_CLIENT'] || ENV['SSH_TTY'] || ENV['SSH_CONNECTION']
end
xdg?() click to toggle source
# File lib/sequenceserver.rb, line 306
def xdg?
  true if ENV['DISPLAY'] && command?('xdg-open')
end