module ActsAsTokenizable::StringUtils
Public Class Methods
alphanumerics(str)
click to toggle source
returns an array that contains, in order:
* the numeric parts, converted to numbers * the non-numeric parts, as text
this is useful for sorting alphanumerically. For example:
- “A1”, “A12”, “A2”].sort_by{|x| x.alphanumerics} => [“A1”, “A2”, “A12”
-
inspired by : blog.labnotes.org/2007/12/13/rounded-corners-173-beautiful-code/
# File lib/acts_as_tokenizable/string_utils.rb, line 45 def self.alphanumerics(str) str.split(/(\d+)/).map { |v| v =~ /\d/ ? v.to_i : v } end
numeric?(str)
click to toggle source
returns true if numeric, false, otherwise
# File lib/acts_as_tokenizable/string_utils.rb, line 6 def self.numeric?(str) true if Float(str) rescue false end
remove_words(str, words_array, separator = ' ')
click to toggle source
removes certain words from a string. As a side-effect, all word-separators are converted to the separator char
# File lib/acts_as_tokenizable/string_utils.rb, line 20 def self.remove_words(str, words_array, separator = ' ') (words(str) - words_array).join separator end
replace_words(str, replacements, separator = ' ')
click to toggle source
replaces certain words on a string. As a side-effect, all word-separators are converted to the separator char
# File lib/acts_as_tokenizable/string_utils.rb, line 26 def self.replace_words(str, replacements, separator = ' ') replaced_words = words(str) replacements.each do |candidates, replacement| candidates.each do |candidate| replaced_words = replaced_words.collect do |w| w == candidate ? replacement : w end end end replaced_words.join separator end
to_token(str, max_length = 255)
click to toggle source
convert into something that can be used as an indexation key
# File lib/acts_as_tokenizable/string_utils.rb, line 50 def self.to_token(str, max_length = 255) # to_slug and normalize are provided by the 'babosa' gem # remove all non-alphanumeric but hyphen (-) str = str.to_slug.normalize.strip.downcase.gsub(/[\s|\.|,]+/, '') # remove duplicates, except on pure numbers str = str.squeeze unless numeric?(str) str[0..(max_length - 1)] end
words(str)
click to toggle source
returns an array of strings containing the words on this string. Removes spaces, strange chars, etc
# File lib/acts_as_tokenizable/string_utils.rb, line 14 def self.words(str) str.split(/[\s|\.|,]+/) end
words_to_token(str, max_length = 255, separator = ' ')
click to toggle source
tokenizes each word individually and joins the word with the separator
# File lib/acts_as_tokenizable/string_utils.rb, line 60 def self.words_to_token(str, max_length = 255, separator = ' ') words(str) .collect { |w| to_token(w) } .uniq .join(separator) .slice(0, max_length) end