module Bio::Jaspar

JASPAR 2014 module

Provides read access to a JASPAR5 formatted database.

This module is a direct import of Bio.motifs.jaspar module in Biopython. The following document contains excerpts from Bio.motifs.jaspar module in Biopython.

Constants

DNA

Unambiguous DNA bases

JASPAR_ORDERED_DNA_LETTERS

JASPAR OUTPUT specific DNA bases

Public Class Methods

calculate_pseudocounts(motif) click to toggle source

Return pseudocounts of a given JASPAR motif

# File lib/bio-jaspar/jaspar.rb, line 240
def Jaspar.calculate_pseudocounts(motif)
        alphabet = motif.alphabet
        background = motif.background

        total = 0
        (0...motif.length).each do |i|
                total += alphabet.letters.map { |letter| motif.counts[letter][i].to_f }.inject(:+)
        end

        avg_nb_instances = total / motif.length
        sq_nb_instances = Math.sqrt(avg_nb_instances)

        if background
                background = Hash[background]
        else
                background = Hash[alphabet.letters.sort.map { |l| [l, 1.0] }]
        end

        total = background.values.inject(:+)
        pseudocounts = {}

        alphabet.letters.each do |letter|
                background[letter] /= total
                pseudocounts[letter] = sq_nb_instances * background[letter]
        end

        return pseudocounts
end
read(handle, format) click to toggle source

Read motif(s) from a file in one of several different JASPAR formats.

Return the record of PFM(s). Call the appropriate routine based on the format passed

# File lib/bio-jaspar/jaspar.rb, line 190
def Jaspar.read(handle, format)
        format = format.downcase
        if format == "pfm"
                record = _read_pfm(handle)
                return record
        elsif format == "sites"
                record = _read_sites(handle)
                return record
        elsif format == "jaspar"
                record = _read_jaspar(handle)
                return record
        else
                raise ArgumentError, "Unknown JASPAR format #{format}"
        end
                        
end
split_jaspar_id(id) click to toggle source

Utility function to split a JASPAR matrix ID into its component.

Components are base ID and version number, e.g. ‘MA0047.2’ is returned as (‘MA0047’, 2).

# File lib/bio-jaspar/jaspar.rb, line 273
def Jaspar.split_jaspar_id(id)
        id_split = id.split(".")

        base_id = nil
        version = nil

        if id_split.length == 2
                base_id = id_split[0]
                version = id_split[1]
        else
                base_id = id
        end

        return base_id, version
end
write(motifs, format) click to toggle source

Return the representation of motifs in “pfm” or “jaspar” format.

# File lib/bio-jaspar/jaspar.rb, line 208
def Jaspar.write(motifs, format)
        letters = JASPAR_ORDERED_DNA_LETTERS
        lines = []
        if format == "pfm"
                motif = motifs[0]
                counts = motif.counts
                letters.each do |letter|
                        terms = counts[letter].map { |value| "%6.2f" % value }
                        line = "#{terms.join(" ")}\n"
                        lines << line
                end
        elsif format == "jaspar"
                motifs.each do |m|
                        counts = m.counts
                        line = ">#{m.matrix_id} #{m.name}\n"
                        lines << line

                        letters.each do |letter|
                                terms = counts[letter].map { |value| "%6.2f" % value }
                                line = "#{letter} [#{terms.join(" ")}]\n"
                                lines << line
                        end
                end
        else
                raise ArgumentError, "Unknown JASPAR format #{format}"
        end
                
        text = lines.join("")
        return text  
end

Private Class Methods

_read_jaspar(handle) click to toggle source

Read motifs from a JASPAR formatted file (PRIVATE).

Format is one or more records of the form, e.g.
  • JASPAR 2010 matrix_only format

    >MA0001.1 AGL3 A [ 0 3 79 40 66 48 65 11 65 0 ] C [94 75 4 3 1 2 5 2 3 3 ] G [ 1 0 3 4 1 0 5 3 28 88 ] T [ 2 19 11 50 29 47 22 81 1 6 ]

  • JASPAR 2010-2014 PFMs format

    >MA0001.1 AGL3 0 3 79 40 66 48 65 11 65 0 94 75 4 3 1 2 5 2 3 3 1 0 3 4 1 0 5 3 28 88 2 19 11 50 29 47 22 81 1 6

# File lib/bio-jaspar/jaspar.rb, line 366
def Jaspar._read_jaspar(handle)
        alphabet = DNA
        counts = {}

        record = Record.new

        head_pat = /^>\s*(\S+)(\s+(\S+))?/
        row_pat_long = /\s*([ACGT])\s*\[\s*(.*)\s*\]/
        row_pat_short = /\s*(.+)\s*/

        identifier = nil
        name = nil
        row_count = 0
        nucleotides = ["A","C","G","T"]
        handle.each do |line|
                line = line.strip
                
                head_match = line.match(head_pat)
                row_match_long = line.match(row_pat_long)
                row_match_short = line.match(row_pat_short)

                if head_match
                        identifier = head_match[1]
                        if head_match[3]
                                name = head_match[3]
                        else
                                name = identifier
                        end
                elsif row_match_long
                        letter, counts_str = row_match_long[1..2]
                        words = counts_str.split
                        counts[letter] = words.map(&:to_f)
                        row_count += 1
                        if row_count == 4
                                record << Motif.new(identifier, 
                                                    name, 
                                                    :alphabet => alphabet, 
                                                    :counts => counts)
                                identifier = nil
                                name = nil
                                counts = {}
                                row_count = 0
                        end
                elsif row_match_short
                        words = row_match_short[1].split
                        counts[nucleotides[row_count]] = words.map(&:to_f)
                        row_count += 1
                        if row_count == 4
                                record << Motif.new(identifier, 
                                                    name, 
                                                    :alphabet => alphabet, 
                                                    :counts => counts)
                                identifier = nil
                                name = nil
                                counts = {}
                                row_count = 0
                        end
                end         
        end

        return record
end
_read_pfm(handle) click to toggle source

Read the motif from a JASPAR .pfm file (PRIVATE).

# File lib/bio-jaspar/jaspar.rb, line 293
def Jaspar._read_pfm(handle)
        alphabet = DNA
        counts = {}

        letters = JASPAR_ORDERED_DNA_LETTERS
        letters.zip(handle).each do |letter, line|
                words = line.split
                if words[0] == letter
                        words = words[1..-1]
                end
                counts[letter] = words.map(&:to_f)
        end

        motif = Motif.new(nil, nil, :alphabet => alphabet, :counts => counts)
        motif.mask = "*" * motif.length
        record = Record.new
        record << motif

        return record
end
_read_sites(handle) click to toggle source

Read the motif from JASPAR .sites file (PRIVATE).

# File lib/bio-jaspar/jaspar.rb, line 315
def Jaspar._read_sites(handle)
        alphabet = DNA
        instances = []

        handle_enum = handle.to_enum

        handle.each do |line|
                unless line.start_with?(">")
                        break
                end

                line = handle_enum.next
                instance = ""
                line.strip.each_char do |c|
                        if c == c.upcase
                                instance += c
                        end
                end
                instance = Bio::Sequence.auto(instance)
                instances << instance
        end

        instances = Bio::Motifs::Instances.new(instances, alphabet)
        motif = Motif.new(nil, nil, :alphabet => alphabet, :instances => instances)
        motif.mask = "*" * motif.length
        record = Record.new
        record << motif

        return record
end