class Bio::NCBI::REST

Description

The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities

Entrez Programming Utilities Help:

Constants

NCBI_INTERVAL

Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby

Wait for 1/3 seconds. NCBI's restriction is: “Make no more than 3 requests every 1 second.”.

Public Class Methods

efetch(*args) click to toggle source
    # File lib/bio/io/ncbirest.rb
391 def self.efetch(*args)
392   self.new.efetch(*args)
393 end
einfo() click to toggle source
    # File lib/bio/io/ncbirest.rb
379 def self.einfo
380   self.new.einfo
381 end
esearch(*args) click to toggle source
    # File lib/bio/io/ncbirest.rb
383 def self.esearch(*args)
384   self.new.esearch(*args)
385 end
esearch_count(*args) click to toggle source
    # File lib/bio/io/ncbirest.rb
387 def self.esearch_count(*args)
388   self.new.esearch_count(*args)
389 end

Public Instance Methods

efetch(ids, hash = {}, step = 100) click to toggle source

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)

  • hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “gbc”, “medline”, “count”,…

  • step: maximum number of entries retrieved at a time

Returns

String

    # File lib/bio/io/ncbirest.rb
355 def efetch(ids, hash = {}, step = 100)
356   serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
357   opts = default_parameters.merge({ "retmode"  => "text" })
358   opts.update(hash)
359 
360   case ids
361   when Array
362     list = ids
363   else
364     list = ids.to_s.split(/\s*,\s*/)
365   end
366 
367   result = ""
368   0.step(list.size, step) do |i|
369     opts["id"] = list[i, step].join(',')
370     unless opts["id"].empty?
371       response = ncbi_post_form(serv, opts)
372       result += response.body
373     end
374   end
375   return result.strip
376   #return result.strip.split(/\n\n+/)
377 end
einfo() click to toggle source

List the NCBI database names E-Utils (einfo) service

pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists

Usage

ncbi = Bio::NCBI::REST.new
ncbi.einfo

Bio::NCBI::REST.einfo

Returns

array of string (database names)

    # File lib/bio/io/ncbirest.rb
218 def einfo
219   serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
220   opts = default_parameters.merge({})
221   response = ncbi_post_form(serv, opts)
222   result = response.body
223   list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
224   return list
225 end
esearch(str, hash = {}, limit = nil, step = 10000) click to toggle source

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)

  • hash: hash of E-Utils option {“db” => “nuccore”, “rettype” => “gb”}

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “medline”, “count”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field:

      • “titl”: Title [TI]

      • “tiab”: Title/Abstract [TIAB]

      • “word”: Text words [TW]

      • “auth”: Author [AU]

      • “affl”: Affiliation [AD]

      • “jour”: Journal [TA]

      • “vol”: Volume [VI]

      • “iss”: Issue [IP]

      • “page”: First page [PG]

      • “pdat”: Publication date [DP]

      • “ptyp”: Publication type [PT]

      • “lang”: Language [LA]

      • “mesh”: MeSH term [MH]

      • “majr”: MeSH major topic [MAJR]

      • “subh”: Mesh sub headings [SH]

      • “mhda”: MeSH date [MHDA]

      • “ecno”: EC/RN Number [rn]

      • “si”: Secondary source ID [SI]

      • “uid”: PubMed ID (PMID) [UI]

      • “fltr”: Filter [FILTER] [SB]

      • “subs”: Subset [SB]

    • reldate: 365

    • mindate: 2001

    • maxdate: 2002/01/01

    • datetype: “edat”

  • limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))

  • step: maximum number of entries retrieved at a time

Returns

array of entry IDs or a number of results

    # File lib/bio/io/ncbirest.rb
286 def esearch(str, hash = {}, limit = nil, step = 10000)
287   serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
288   opts = default_parameters.merge({ "term" => str })
289   opts.update(hash)
290 
291   case opts["rettype"]
292   when "count"
293     count = esearch_count(str, opts)
294     return count
295   else
296     retstart = 0
297     retstart = hash["retstart"].to_i if hash["retstart"]
298 
299     limit ||= hash["retmax"].to_i if hash["retmax"]
300     limit ||= 100 # default limit is 100
301     limit = esearch_count(str, opts) if limit == 0   # unlimit
302 
303     list = []
304     0.step(limit, step) do |i|
305       retmax = [step, limit - i].min
306       opts.update("retmax" => retmax, "retstart" => i + retstart)
307       response = ncbi_post_form(serv, opts)
308       result = response.body
309       list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
310     end
311     return list
312   end
313 end
esearch_count(str, hash = {}) click to toggle source
Arguments

same as esearch method

Returns

array of entry IDs or a number of results

    # File lib/bio/io/ncbirest.rb
317 def esearch_count(str, hash = {})
318   serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
319   opts = default_parameters.merge({ "term" => str })
320   opts.update(hash)
321   opts.update("rettype" => "count")
322   response = ncbi_post_form(serv, opts)
323   result = response.body
324   count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
325   return count
326 end

Private Instance Methods

default_parameters() click to toggle source

(Private) default parameters


Returns

Hash

    # File lib/bio/io/ncbirest.rb
155 def default_parameters
156   Bio::NCBI::ENTREZ_DEFAULT_PARAMETERS
157 end
ncbi_access_wait(wait = NCBI_INTERVAL) click to toggle source

(Private) Sleeps until allowed to access.


Arguments:

  • (required) wait: wait unit time

Returns

(undefined)

    # File lib/bio/io/ncbirest.rb
138 def ncbi_access_wait(wait = NCBI_INTERVAL)
139   @@last_access_mutex ||= Mutex.new
140   @@last_access_mutex.synchronize {
141     if @@last_access
142       duration = Time.now - @@last_access
143       if wait > duration
144         sleep wait - duration
145       end
146     end
147     @@last_access = Time.now
148   }
149   nil
150 end
ncbi_check_parameters(opts) click to toggle source

(Private) Checks parameters as NCBI requires. If no email or tool parameter, raises an error.

NCBI announces that “Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.”


Arguments:

  • (required) opts: Hash containing parameters

Returns

(undefined)

    # File lib/bio/io/ncbirest.rb
186 def ncbi_check_parameters(opts)
187   #return if Time.now < Time.gm(2010,5,31)
188   if opts['email'].to_s.empty? then
189     raise 'Set email parameter for the query, or set Bio::NCBI.default_email = "(email address of the author of this software)"'
190   end
191   if opts['tool'].to_s.empty? then
192     raise 'Set tool parameter for the query, or set Bio::NCBI.default_tool = "(your tool name)"'
193   end
194   nil
195 end
ncbi_post_form(serv, opts) click to toggle source

(Private) Sends query to NCBI.


Arguments:

  • (required) serv: (String) server URI string

  • (required) opts: (Hash) parameters

Returns

nil

    # File lib/bio/io/ncbirest.rb
165 def ncbi_post_form(serv, opts)
166   ncbi_check_parameters(opts)
167   ncbi_access_wait
168   #$stderr.puts opts.inspect
169   response = Bio::Command.post_form(serv, opts)
170   response
171 end