class Object
Constants
- U
This version of
UnicodeUtils
implements algorithms as defined by version 6.2.0 of the Unicode standard. Each public method is declared as amodule_function
of theUnicodeUtils
module and defined in a separate file under theunicode_utils
directory.As a convenience, the toplevel
unicode_utils
file loads all methods (needs lots of memory!). Also as a convenience for irb usage, the fileunicode_utils/u
assigns theUnicodeUtils
module to the toplevelU
constant and loads all methods:$ irb -r unicode_utils/u irb(main):001:0> U.grep /angstrom/ => [#<U+212B "Å" ANGSTROM SIGN utf8:e2,84,ab>]
If a method takes a character as argument (usually named
char
), that argument can be an integer or a string (in which case the first code point counts) or any other object that responds toord
by returning an integer.All methods are non-destructive, string return values are in the same encoding as strings passed as arguments, which must be in one of the Unicode encodings.
Highlevel methods are:
UnicodeUtils.upcase
-
full conversion to uppercase
UnicodeUtils.downcase
-
full conversion to lowercase
UnicodeUtils.titlecase
-
full conversion to titlecase
UnicodeUtils.casefold
-
case folding (case insensitive string comparison)
UnicodeUtils.nfd
-
Normalization Form D
UnicodeUtils.nfc
-
Normalization Form C
UnicodeUtils.nfkd
-
Normalization Form KD
UnicodeUtils.nfkc
-
Normalization Form KC
UnicodeUtils.each_grapheme
-
grapheme boundaries
UnicodeUtils.each_word
-
word boundaries
UnicodeUtils.char_name
-
character names
UnicodeUtils.grep
-
find code points by character name