class EuPathDBGeneInformationFileExtractor

A class for extracting gene info from a particular gene from the information file

Attributes

filename[RW]

A filename path to the gene information file

Public Class Methods

new(filename = nil) click to toggle source
# File lib/eupathdb_gene_information_table.rb, line 11
def initialize(filename = nil)
  @filename = filename
end

Public Instance Methods

extract_gene_info(wanted_gene_id, grep_hack_lines = nil) click to toggle source

Returns a EuPathDBGeneInformation object corresponding to the wanted key. If there are multiple in the file, only the first is returned. If none are found, nil is returned.

If grep_hack_lines is defined (as an integer), then a shortcut is applied to speed things up. Before parsing the gene info file, grep some lines after the “Gene Id: ..” line. Then feed that into the parser.

# File lib/eupathdb_gene_information_table.rb, line 19
def extract_gene_info(wanted_gene_id, grep_hack_lines = nil)
  inside_iterator = lambda do |gene|
    return gene if wanted_gene_id == gene.get_info('Gene Id')
  end
  
  filename = @filename
  if grep_hack_lines and grep_hack_lines.to_i != 0
    Tempfile.new('reubypathdb_grep_hack') do |tempfile|
      # grep however many lines from past the point. Rather dodgy, but faster.
      raise Exception, "grep_hack_lines should be an integer" unless grep_hack_lines.is_a?(Integer)
      `grep -A #{grep_hack_lines} 'Gene Id: #{wanted_gene_id}' '#{@filename}' >#{tempfile.path}`
      EuPathDBGeneInformationTable.new(File.open(tempfile.path)).each do |gene|
        inside_iterator.call(gene)
      end
    end
  else
    # no grep hack. Parse the whole gene information file
    EuPathDBGeneInformationTable.new(File.open(@filename)).each do |gene|
      inside_iterator.call(gene)
    end
  end
  return nil
end