module CsvImportAnalyzer::DelimiterIdentifier
Public Instance Methods
delimiter()
click to toggle source
Types of delimiters that the gem has to lookout for. Could be changed in future or to custom delimiters returns a @delimiter instance variable array
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 10 def delimiter @delimiter ||= [",", ";", "\t", "|"] end
delimiter_count()
click to toggle source
Routine to intialize the delimiter_count
hash with the delimiters defined above with a base count of 0 Returns @delimiter_count instance variable
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 18 def delimiter_count @delimiter_count ||= Hash[delimiter.map {|v| [v,0]}] @delimiter_count end
identify_delimiter(filename_or_sample)
click to toggle source
Method to analyze input data and determine delimiter Input can be either a csv file or even a array of strings returns delimiter
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 28 def identify_delimiter(filename_or_sample) #filename_or_sample input can be either a File or an Array or a string - Return delimiter for File or an Array of strings (if found) if filename_or_sample.class == String if File::exists?(filename_or_sample) current_line_number = 0 File.foreach(filename_or_sample) do |line| count_occurances_delimiter(line) current_line_number += 1 if current_line_number > 3 break end end else # count_occurances_delimiter(filename_or_sample) return FileNotFound.new end return_plausible_delimiter elsif filename_or_sample.class == Array filename_or_sample.each do |line| count_occurances_delimiter(line) end return_plausible_delimiter else InvalidInput.new end end
Private Instance Methods
count_occurances_delimiter(line)
click to toggle source
Find the count of delimiter occurances in a line CSV files can have delimiters escaped between quotes valid count = total_count - delimiters inside quotes
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 67 def count_occurances_delimiter(line) delimiter_count.keys.each do |key| #Count the occurances of delimiter in a line total_count_delimiter = line.substr_count(key) #count the occurances of delimiter between quotes inside a line to disregard them quoted_delimiter_count = getting_contents_of_quoted_values(line).substr_count(key) delimiter_count[key] += total_count_delimiter - quoted_delimiter_count end end
getting_contents_of_quoted_values(input)
click to toggle source
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 57 def getting_contents_of_quoted_values(input) #return a join of all the strings inside quotes inside a line input.scan(/".*?"/).join end
return_plausible_delimiter()
click to toggle source
Plausible delimiter would be the one i.e. of most occurance of the set of rows
# File lib/csv-import-analyzer/analyzer/delimiter_identifier.rb, line 80 def return_plausible_delimiter return delimiter_count.key(delimiter_count.values.max) end