class GeneValidator::LengthClusterValidation
This class contains the methods necessary for length validation by hit length clusterization
Attributes
Public Class Methods
Initilizes the object Params: type
: type of the predicted sequence (:nucleotide or :protein) prediction
: a Sequence
object representing the blast query hits
: a vector of Sequence
objects (representing blast hits) dilename
: String
with the name of the fasta file
# File lib/genevalidator/validation_length_cluster.rb, line 85 def initialize(prediction, hits) super @short_header = 'LengthCluster' @header = 'Length Cluster' @description = 'Check whether the prediction length fits most of the' \ ' BLAST hit lengths, by 1D hierarchical clusterization.' \ ' Meaning of the output displayed: Query_length' \ ' [Main Cluster Length Interval]' @cli_name = 'lenc' end
Public Instance Methods
Clusterization by length from a list of sequences Params:
debug
(optional)-
true to display debug information, false by default
lst
-
array of
Query
objects predicted_seq
-
Query
objetc
- output 1
-
array of
Cluster
objects - output 2
-
the index of the most dense cluster
# File lib/genevalidator/validation_length_cluster.rb, line 147 def clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) raise TypeError unless lst[0].is_a?(Query) && predicted_seq.is_a?(Query) contents = lst.map { |x| x.length_protein.to_i }.sort { |a, b| a <=> b } hc = HierarchicalClusterization.new(contents) clusters = hc.hierarchical_clusterization max_density = 0 max_density_cluster_idx = 0 clusters.each_with_index do |item, i| next unless item.density > max_density max_density = item.density max_density_cluster_idx = i end [clusters, max_density_cluster_idx] rescue TypeError => error error_location = error.backtrace[0].scan(%r{([^/]+:\d+):.*})[0][0] warn "Type error at #{error_location}." warn ' Possible cause: one of the arguments of the' \ ' "clusterization_by_length" method has not the proper type.' exit 1 end
Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster
objects output
: plot_path where to save the graph clusters
: array of Cluster
objects max_density_cluster
: index of the most dense cluster prediction
: Sequence
object Output: Plot
object
# File lib/genevalidator/validation_length_cluster.rb, line 183 def plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) data = clusters.each_with_index.map do |cluster, i| cluster.lengths.collect do |k, v| { 'key' => k, 'value' => v, 'main' => (i == max_density_cluster) } end end Plot.new(data, :bars, 'Length Cluster Validation: Distribution of BLAST hit lengths', 'Query Sequence, black;Most Dense Cluster,red;Other Hits, blue', 'Sequence Length', 'Number of Sequences', prediction.length_protein) end
Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see plot
variable) Output: LengthClusterValidationOutput
object
# File lib/genevalidator/validation_length_cluster.rb, line 103 def run raise NotEnoughHitsError if hits.length < opt[:min_blast_hits] raise unless prediction.is_a?(Query) && hits[0].is_a?(Query) start = Time.now # get [clusters, max_density_cluster_idx] clusterization = clusterization_by_length @clusters = clusterization[0] @max_density_cluster = clusterization[1] limits = @clusters[@max_density_cluster].get_limits query_length = @prediction.length_protein @validation_report = LengthClusterValidationOutput.new(@short_header, @header, @description, query_length, limits) plot1 = plot_histo_clusters @validation_report.plot_files.push(plot1) @validation_report.run_time = Time.now - start @validation_report rescue NotEnoughHitsError @validation_report = ValidationReport.new('Not enough evidence', :warning, @short_header, @header, @description) rescue StandardError @validation_report = ValidationReport.new('Unexpected error', :error, @short_header, @header, @description) @validation_report.errors.push 'Unexpected Error' end