module SequenceServer
Top level module / namespace.
Define constants used by SequenceServer
module
www.ncbi.nlm.nih.gov/books/NBK1763/ (Appendices)
Define Config
class.
Define Database
class.
This file defines all possible exceptions that can be thrown by SequenceServer
on startup.
Exceptions only ever inform another entity (downstream code or users) of an issue. Exceptions may or may not be recoverable.
Error
classes should be seen as: the error code (class name), human readable message (to_s method), and necessary attributes to act on the error.
We define as many error classes as needed to be precise about the issue, thus making it easy for downstream code (bin/sequenceserver or config.ru) to act on them.
Define Sequence
class.
Define version number.
Constants
- APIError
API errors have an http status, title, message, and additional information like stacktrace or information from program output.
- BLAST_VERSION
The default version of
BLAST
that will be downloaded and configured for use.- DEFAULT_CONFIG_FILE
Default location of configuration file.
- DOTDIR
Constant for denoting the path ~/.sequenceserver
- Database
Captures a directory containing FASTA files and
BLAST
databases.Formatting a FASTA for use with BLAST+ will create 3 or 6 files, collectively referred to as a
BLAST
database.It is important that formatted
BLAST
database files have the same dirname and basename as the source FASTA forSequenceServer
to be able to tell formatted FASTA from unformatted. And that FASTA files be formatted with ‘parse_seqids` option of `makeblastdb` for sequence retrieval to work.SequenceServer
will always placeBLAST
database files alongside input FASTA, and use ‘parse_seqids` option of `makeblastdb` to format databases.- Error
- Sequence
Provides simple sequence processing utilities via class methods. Instance of the class serves as a simple data object to captures sequences fetched from
BLAST
databases.NOTE:
What all do we need to consistently construct FASTA from `blastdbcmd's` output? It would seem rather straightforward. But it's not. FASTA format: >id title actual sequence ID of a sequence fetched from nr database should look like this: gi|322796550|gb|EFZ19024.1| -> self.id accession -> self.accession ---------- sequence id -> self.seqid ------------- --------- gi number -> self.gi while for local databases, the id should be the exact same, as in the original FASTA file: SI2.2.0_06267 -> self.id == self.accession
- VERSION
Attributes
Holds SequenceServer
configuration object for this process. This is available only after calling SequenceServer.init
.
Public Class Methods
Rack-interface.
Add our logger to Rack env and let Routes
do the rest.
# File lib/sequenceserver.rb, line 115 def call(env) env['rack.logger'] = logger Routes.call(env) end
Returns true if RACK_ENV is set to ‘development’. Raw JS and CSS files are served in development mode and the logger is made more verbose.
# File lib/sequenceserver.rb, line 43 def development? environment == 'development' end
Returns ENV. This environment variable determines if we are in development on in production mode (default).
# File lib/sequenceserver.rb, line 37 def environment ENV['RACK_ENV'] end
SequenceServer
initialisation routine.
# File lib/sequenceserver.rb, line 67 def init(config = {}) # Reset makeblastdb cache, because configuration may have changed. @makeblastdb = nil # Use default config file if caller didn't specify one. config[:config_file] ||= DEFAULT_CONFIG_FILE # Initialise global configuration object from the above config hash. @config = Config.new(config) # When in development mode, cause SequenceServer to terminate if any # thread spawned by the main process raises an unhandled exception. In # production mode the expectation is to log at appropriate severity level # and continue operating. Thread.abort_on_exception = true if development? # Now locate binaries, scan databases directory, require any plugin files. load_extension init_binaries init_database # The above methods validate bin dir, database dir, and path to plugin # files. Port and host settings don't need to be validated: if running # in self-hosted mode, WEBrick will handle incorrect values and if # running via Apache+Passenger host and port settings are not used. # Let's validate remaining configuration keys next. # Validate number of threads to use with BLAST. check_num_threads # Doesn't make sense to activate JobRemover when testing. It anyway # keeps stumbling on the mock test jobs that miss a few keys. unless environment == 'test' @job_remover = JobRemover.new(@config[:job_lifetime]) end # 'self' is the most meaningful object that can be returned by this # method. self end
This method is invoked by the -i switch to start an IRB shell with SequenceServer
loaded.
# File lib/sequenceserver.rb, line 173 def irb ARGV.clear require 'irb' IRB.setup nil IRB.conf[:MAIN_CONTEXT] = IRB::Irb.new.context require 'irb/ext/multi-irb' IRB.irb nil, self end
Logger
object used in the initialisation routine and throughout the application.
# File lib/sequenceserver.rb, line 50 def logger @logger ||= case environment when 'development' Logger.new(STDERR, Logger::DEBUG) when 'test' Logger.new(STDERR, Logger::WARN) else Logger.new(STDERR, Logger::INFO) end end
MAKEBLASTDB
service object.
# File lib/sequenceserver.rb, line 62 def makeblastdb @makeblastdb ||= MAKEBLASTDB.new(config[:database_dir]) end
This method is called after WEBrick has bound to the host and port and is ready to accept connections.
# File lib/sequenceserver.rb, line 138 def on_start puts '** SequenceServer is ready.' puts " Go to #{server_url} in your browser and start BLASTing!" if ip_address puts ' To share your setup, try one of the following addresses. These' puts ' may only work within your home, office, or university network.' puts " - http://#{ip_address}:#{config[:port]}" puts " - http://#{hostname}:#{config[:port]}" if hostname puts ' To share your setup with anyone in the world, ask your IT team' puts ' for a public IP address or consider the SequenceServer cloud' puts ' hosting service: https://sequenceserver.com/cloud' puts ' To disable sharing, set :host: key in config file to 127.0.0.1' puts ' and restart server.' end puts ' To terminate server, press CTRL+C' open_in_browser(server_url) end
This method is called when WEBrick is terminated.
# File lib/sequenceserver.rb, line 157 def on_stop puts puts '** Thank you for using SequenceServer :).' puts ' Please cite: ' puts ' Priyam A, Woodcroft BJ, Rai V, Moghul I, Munagala A, Ter F,' puts ' Chowdhary H, Pieniak I, Maynard LJ, Gibbins MA, Moon H,' puts ' Davis-Richardson A, Uludag M, Watson-Haigh N, Challis R,' puts ' Nakamura H, Favreau E, Gómez EA, Pluskal T, Leonard G,' puts ' Rumpf W & Wurm Y.' puts ' Sequenceserver: A modern graphical user interface for' puts ' custom BLAST databases.' puts ' Molecular Biology and Evolution (2019)' end
Run SequenceServer
using WEBrick.
# File lib/sequenceserver.rb, line 121 def run Server.run(self) rescue Errno::EADDRINUSE puts "** Could not bind to port #{config[:port]}." puts " Is SequenceServer already accessible at #{server_url}?" puts ' No? Try running SequenceServer on another port, like so:' puts puts ' sequenceserver -p 4570.' rescue Errno::EACCES puts "** Need root privilege to bind to port #{config[:port]}." puts ' It is not advisable to run SequenceServer as root.' puts ' Please use Apache/Nginx to bind to a privileged port.' puts ' Instructions available on http://sequenceserver.com.' end
‘sys’ executes a shell command.
‘sys’ can write the stdout and/or stderr from a shell command to files, or
return these values.
‘sys’ can get from a failed shell command stdout, stderr, and exit status.
Supply ‘sys’ with the shell command and optionally: dir: A directory to change to for the duration of the execution of the shell command. path: A directory to change the PATH environment variable to for the duration of the execution of the shell command. stdout: A path to a file to store stdout. stderr: A path to a file to store stderr.
Usage:
sys(command, dir: ‘/path/to/directory’, path: ‘/path/to/directory’,
stdout: '/path/to/stdout_file', stderr: '/path/to/stderr_file')
rubocop:disable Metrics/CyclomaticComplexity
# File lib/sequenceserver/sys.rb, line 25 def self.sys(command, options = {}) # Available output channels channels = %i[stdout stderr] # Make temporary files to store output from stdout and stderr. temp_files = { stdout: Tempfile.new('sequenceserver-sys'), stderr: Tempfile.new('sequenceserver-sys') } # Log the command we are going to run - use -D option to view. logger.debug("Executing: #{command}") # Run command in a child process. This allows us to control PATH # and pwd of the running process. child_pid = fork do # Set the PATH environment variable to the binary directory or # safe directory. ENV['PATH'] = options[:path] if options[:path] # Change to the specified directory. Dir.chdir(options[:dir]) if options[:dir] && Dir.exist?(options[:dir]) # Execute the shell command, redirect stdout and stderr to the # temporary files. exec(command, out: temp_files[:stdout].path.to_s, \ err: temp_files[:stderr].path.to_s) end # Wait for the termination of the child process. _, status = Process.wait2(child_pid) # If a full path was given for stdout and stderr files, move the # temporary files to this path. If the path given does not exist, # create it. channels.each do |channel| filename = options[channel] break unless filename # If the given path has a directory component, ensure it exists. file_dir = File.dirname(filename) FileUtils.mkdir_p(file_dir) unless File.directory?(file_dir) # Now move the temporary file to the given path. # TODO: don't we need to explicitly close the temp file here? FileUtils.cp(temp_files[channel], filename) end # Read the remaining temp files into memory. For large outputs, # the caller should supply a file path to prevent loading the # output in memory. temp_files.each do |channel, tempfile| temp_files[channel] = tempfile.read end # Finally, return contents of the remaining temp files if the # command completed successfully or raise CommandFailed error. return temp_files.values if status.success? raise CommandFailed.new(status.exitstatus, **temp_files) end
Private Class Methods
# File lib/sequenceserver.rb, line 254 def assert_blast_installed_and_compatible begin out, = sys('blastdbcmd -version', path: config[:bin]) rescue CommandFailed fail BLAST_NOT_INSTALLED_OR_NOT_EXECUTABLE end version = out.split[1] fail BLAST_NOT_INSTALLED_OR_NOT_EXECUTABLE if version.empty? fail BLAST_NOT_COMPATIBLE, version unless is_compatible(version, BLAST_VERSION) end
# File lib/sequenceserver.rb, line 216 def check_database_compatibility Database.each do |database| logger.debug "Found #{database.type} database '#{database.title}' at '#{database.path}'" if database.non_parse_seqids? logger.warn "Database '#{database.title}' was created without using the" \ ' -parse_seqids option of makeblastdb. FASTA download will' \ " not work correctly (path: '#{database.path}')." elsif database.v4? logger.warn "Database '#{database.title}' is of older format. Mixing" \ ' old and new format databases can be problematic' \ "(path: '#{database.path}')." end end end
# File lib/sequenceserver.rb, line 231 def check_num_threads num_threads = Integer(config[:num_threads]) fail NUM_THREADS_INCORRECT unless num_threads.positive? logger.debug "Will use #{num_threads} threads to run BLAST." if num_threads > 256 logger.warn "Number of threads set at #{num_threads} is unusually high." end rescue ArgumentError raise NUM_THREADS_INCORRECT end
Return ‘true` if the given command exists and is executable.
# File lib/sequenceserver.rb, line 311 def command?(command) system("which #{command} > /dev/null 2>&1") end
Returns machine’s hostname based on the local ip. If hostname cannot be determined returns nil.
# File lib/sequenceserver.rb, line 279 def hostname Resolv.getname(ip_address) rescue nil end
# File lib/sequenceserver.rb, line 184 def init_binaries if config[:bin] config[:bin] = File.expand_path config[:bin] unless File.exist?(config[:bin]) && File.directory?(config[:bin]) fail ENOENT.new('bin dir', config[:bin]) end logger.debug("Will use NCBI BLAST+ at: #{config[:bin]}") else logger.debug('Location of NCBI BLAST+ not provided. Assuming NCBI' \ ' BLAST+ to be present in: $PATH') end assert_blast_installed_and_compatible end
# File lib/sequenceserver.rb, line 199 def init_database fail DATABASE_DIR_NOT_SET unless config[:database_dir] config[:database_dir] = File.expand_path(config[:database_dir]) unless File.exist?(config[:database_dir]) && File.directory?(config[:database_dir]) fail ENOENT.new('database dir', config[:database_dir]) end logger.debug("Will look for BLAST+ databases in: #{config[:database_dir]}") fail NO_BLAST_DATABASE_FOUND, config[:database_dir] unless makeblastdb.any_formatted? Database.collection = makeblastdb.formatted_fastas check_database_compatibility unless config[:optimistic].to_s == 'true' end
Returns a local ip adress.
# File lib/sequenceserver.rb, line 272 def ip_address addrinfo = Socket.ip_address_list.find { |ai| ai.ipv4? && !ai.ipv4_loopback? } addrinfo.ip_address if addrinfo end
Returns true if the given version is higher than the minimum expected version string.
# File lib/sequenceserver.rb, line 317 def is_compatible(given, expected) # The speceship operator (<=>) below returns -1, 0, 1 depending on # on whether the left operand is lower, same, or higher than the # right operand. We want the left operand to be the same or higher. (parse_version(given) <=> parse_version(expected)) >= 0 end
# File lib/sequenceserver.rb, line 242 def load_extension return unless config[:require] config[:require] = File.expand_path config[:require] unless File.exist?(config[:require]) && File.file?(config[:require]) fail ENOENT.new('extension file', config[:require]) end logger.debug("Loading extension: #{config[:require]}") require config[:require] end
Uses ‘open` on Mac or `xdg-open` on Linux to opens the search form in user’s default browser. This function is called when SequenceServer
is launched from the terminal. Errors, if any, are silenced.
rubocop:disable Metrics/CyclomaticComplexity, Style/RescueStandardError, Lint/HandleExceptions
# File lib/sequenceserver.rb, line 289 def open_in_browser(server_url) return if using_ssh? || verbose? if RUBY_PLATFORM =~ /linux/ && xdg? sys("xdg-open #{server_url}") elsif RUBY_PLATFORM =~ /darwin/ sys("open #{server_url}") end rescue # fail silently end
Turn version string into an arrary of its component numbers.
# File lib/sequenceserver.rb, line 325 def parse_version(version_string) version_string.split('.').map(&:to_i) end
# File lib/sequenceserver.rb, line 265 def server_url host = config[:host] host = 'localhost' if ['127.0.0.1', '0.0.0.0'].include?(host) "http://#{host}:#{config[:port]}" end
rubocop:enable Metrics/CyclomaticComplexity, Style/RescueStandardError, Lint/HandleExceptions
# File lib/sequenceserver.rb, line 302 def using_ssh? true if ENV['SSH_CLIENT'] || ENV['SSH_TTY'] || ENV['SSH_CONNECTION'] end
# File lib/sequenceserver.rb, line 306 def xdg? true if ENV['DISPLAY'] && command?('xdg-open') end