module Nomener::Cleaner
Module with helper functions to clean strings
Currently exposes .reformat .cleanup! .dustoff
Constants
- DIRTY_STUFF
regex for name characters we aren’t going to use
- TRAILER_TRASH
regex for stuff at the end we want to get out
Public Class Methods
Internal: Clean up a string where there are numerous consecutive and
trailing non-name characters. Modifies given string in place.
args - strings to clean up
Returns nothing
# File lib/nomener/cleaner.rb, line 55 def self.cleanup!(*args) args.each do |dirty| next unless dirty.is_a?(String) dirty.gsub! DIRTY_STUFF, ' ' dirty.squeeze! ' ' # remove any trailing commas or whitespace dirty.gsub! TRAILER_TRASH, '' dirty.strip! end end
Internal: Clean up a given string. Quotes from en.wikipedia.org/wiki/Quotation_mark
Needs to be fixed up for matching and non-english quotes
name - the string to clean
Returns a string which is (ideally) pretty much the same as it was given.
# File lib/nomener/cleaner.rb, line 31 def self.reformat(name) @@allowable = %r![^\p{Alpha}\-&\/\ \.\,\'\"\(\) #{Nomener.config.left} #{Nomener.config.right} #{Nomener.config.single} ] !x unless @@allowable # remove illegal characters, translate fullwidth down nomen = name.dup.scrub.tr("\uFF02\uFF07", "\u0022\u0027") nomen = replace_doubles(nomen) replace_singles(nomen) .gsub(/@@allowable/, ' ') .squeeze(' ') .strip end
Private Class Methods
Internal: Get the quotes from a string
Currently unused. To Be Removed.
str - the string of two characters for the left and right quotes
Returns an array of the [left, right] quotes
# File lib/nomener/cleaner.rb, line 100 def self.quotes_from(str = '""') left, right = str.split(/\B/) right = left unless right [left, right] end
Internal: Replace various double quote characters with given char
str - the string to find replacements in
Returns the string with the quotes replaced
# File lib/nomener/cleaner.rb, line 72 def self.replace_doubles(str) left, right = [Nomener.config.left, Nomener.config.right] # replace left and right double quotes str.tr("\u0022\u00AB\u201C\u201E\u2036\u300E\u301D\u301F\uFE43", left) .tr("\u0022\u00BB\u201D\u201F\u2033\u300F\u301E\uFE44", right) end
Internal: Replace various single quote characters with given chars
str - the string to find replacements in
Returns the string with the quotes replaced
# File lib/nomener/cleaner.rb, line 86 def self.replace_singles(str) # replace left and right single quotes str.tr("\u0027\u2018\u201A\u2035\u2039\u300C\uFE41\uFF62", Nomener.config.single) .tr("\u0027\u2019\u201B\u2032\u203A\u300D\uFE42\uFF62", Nomener.config.single) end