module Bio::Jaspar
JASPAR 2014 module¶ ↑
Provides read access to a JASPAR5 formatted database.
This module is a direct import of Bio.motifs.jaspar module in Biopython. The following document contains excerpts from Bio.motifs.jaspar module in Biopython.
Constants
Public Class Methods
Return pseudocounts of a given JASPAR motif
# File lib/bio-jaspar/jaspar.rb, line 240 def Jaspar.calculate_pseudocounts(motif) alphabet = motif.alphabet background = motif.background total = 0 (0...motif.length).each do |i| total += alphabet.letters.map { |letter| motif.counts[letter][i].to_f }.inject(:+) end avg_nb_instances = total / motif.length sq_nb_instances = Math.sqrt(avg_nb_instances) if background background = Hash[background] else background = Hash[alphabet.letters.sort.map { |l| [l, 1.0] }] end total = background.values.inject(:+) pseudocounts = {} alphabet.letters.each do |letter| background[letter] /= total pseudocounts[letter] = sq_nb_instances * background[letter] end return pseudocounts end
Read motif(s) from a file in one of several different JASPAR formats.
Return the record of PFM(s). Call the appropriate routine based on the format passed
# File lib/bio-jaspar/jaspar.rb, line 190 def Jaspar.read(handle, format) format = format.downcase if format == "pfm" record = _read_pfm(handle) return record elsif format == "sites" record = _read_sites(handle) return record elsif format == "jaspar" record = _read_jaspar(handle) return record else raise ArgumentError, "Unknown JASPAR format #{format}" end end
Utility function to split a JASPAR matrix ID into its component.
Components are base ID and version number, e.g. ‘MA0047.2’ is returned as (‘MA0047’, 2).
# File lib/bio-jaspar/jaspar.rb, line 273 def Jaspar.split_jaspar_id(id) id_split = id.split(".") base_id = nil version = nil if id_split.length == 2 base_id = id_split[0] version = id_split[1] else base_id = id end return base_id, version end
Return the representation of motifs in “pfm” or “jaspar” format.
# File lib/bio-jaspar/jaspar.rb, line 208 def Jaspar.write(motifs, format) letters = JASPAR_ORDERED_DNA_LETTERS lines = [] if format == "pfm" motif = motifs[0] counts = motif.counts letters.each do |letter| terms = counts[letter].map { |value| "%6.2f" % value } line = "#{terms.join(" ")}\n" lines << line end elsif format == "jaspar" motifs.each do |m| counts = m.counts line = ">#{m.matrix_id} #{m.name}\n" lines << line letters.each do |letter| terms = counts[letter].map { |value| "%6.2f" % value } line = "#{letter} [#{terms.join(" ")}]\n" lines << line end end else raise ArgumentError, "Unknown JASPAR format #{format}" end text = lines.join("") return text end
Private Class Methods
Read motifs from a JASPAR formatted file (PRIVATE).
- Format is one or more records of the form, e.g.
- JASPAR 2010 matrix_only format
-
>MA0001.1 AGL3 A [ 0 3 79 40 66 48 65 11 65 0 ] C [94 75 4 3 1 2 5 2 3 3 ] G [ 1 0 3 4 1 0 5 3 28 88 ] T [ 2 19 11 50 29 47 22 81 1 6 ]
- JASPAR 2010-2014 PFMs format
-
>MA0001.1 AGL3 0 3 79 40 66 48 65 11 65 0 94 75 4 3 1 2 5 2 3 3 1 0 3 4 1 0 5 3 28 88 2 19 11 50 29 47 22 81 1 6
# File lib/bio-jaspar/jaspar.rb, line 366 def Jaspar._read_jaspar(handle) alphabet = DNA counts = {} record = Record.new head_pat = /^>\s*(\S+)(\s+(\S+))?/ row_pat_long = /\s*([ACGT])\s*\[\s*(.*)\s*\]/ row_pat_short = /\s*(.+)\s*/ identifier = nil name = nil row_count = 0 nucleotides = ["A","C","G","T"] handle.each do |line| line = line.strip head_match = line.match(head_pat) row_match_long = line.match(row_pat_long) row_match_short = line.match(row_pat_short) if head_match identifier = head_match[1] if head_match[3] name = head_match[3] else name = identifier end elsif row_match_long letter, counts_str = row_match_long[1..2] words = counts_str.split counts[letter] = words.map(&:to_f) row_count += 1 if row_count == 4 record << Motif.new(identifier, name, :alphabet => alphabet, :counts => counts) identifier = nil name = nil counts = {} row_count = 0 end elsif row_match_short words = row_match_short[1].split counts[nucleotides[row_count]] = words.map(&:to_f) row_count += 1 if row_count == 4 record << Motif.new(identifier, name, :alphabet => alphabet, :counts => counts) identifier = nil name = nil counts = {} row_count = 0 end end end return record end
Read the motif from a JASPAR .pfm file (PRIVATE).
# File lib/bio-jaspar/jaspar.rb, line 293 def Jaspar._read_pfm(handle) alphabet = DNA counts = {} letters = JASPAR_ORDERED_DNA_LETTERS letters.zip(handle).each do |letter, line| words = line.split if words[0] == letter words = words[1..-1] end counts[letter] = words.map(&:to_f) end motif = Motif.new(nil, nil, :alphabet => alphabet, :counts => counts) motif.mask = "*" * motif.length record = Record.new record << motif return record end
Read the motif from JASPAR .sites file (PRIVATE).
# File lib/bio-jaspar/jaspar.rb, line 315 def Jaspar._read_sites(handle) alphabet = DNA instances = [] handle_enum = handle.to_enum handle.each do |line| unless line.start_with?(">") break end line = handle_enum.next instance = "" line.strip.each_char do |c| if c == c.upcase instance += c end end instance = Bio::Sequence.auto(instance) instances << instance end instances = Bio::Motifs::Instances.new(instances, alphabet) motif = Motif.new(nil, nil, :alphabet => alphabet, :instances => instances) motif.mask = "*" * motif.length record = Record.new record << motif return record end