class Tilia::VObject::StringUtil
Useful utilities for working with various strings.
Public Class Methods
convert_to_utf8(str)
click to toggle source
This method tries its best to convert the input string to UTF-8.
Currently only ISO-5991-1 input and UTF-8 input is supported, but this may be expanded upon if we receive other examples.
@param [String] str
@return [String]
# File lib/tilia/v_object/string_util.rb, line 26 def self.convert_to_utf8(str) str = str.encode('UTF-8', guess_encoding(str)) # Removing any control characters str.gsub(/(?:[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F])/, '') end
guess_encoding(str)
click to toggle source
Detects the encoding of a string
Currently only supports 'UTF-8', 'ISO-5991-1' and 'Windows-1252'.
@param [String] str @return [String] encoding
# File lib/tilia/v_object/string_util.rb, line 39 def self.guess_encoding(str) cd = CharDet.detect(str) # Best solution I could find ... if cd['confidence'] > 0.4 && cd['encoding'] =~ /(?:windows|iso)/i cd['encoding'] else 'UTF-8' end end
mb_strcut(string, length)
click to toggle source
Cuts the string after a certain bytelength
@param [String] string @param [Fixnum] length @return [String] cut string
# File lib/tilia/v_object/string_util.rb, line 55 def self.mb_strcut(string, length) return '' if string == '' string = string.clone tmp = '' while tmp.bytesize <= length tmp += string[0] string[0] = '' end # Last char was utf-8 multibyte if tmp.bytesize > length string[0] = tmp[-1] + string[0].to_s tmp[-1] = '' end tmp end
utf8?(str)
click to toggle source
Returns true or false depending on if a string is valid UTF-8.
@param [String] str
@return [Boolean]
# File lib/tilia/v_object/string_util.rb, line 10 def self.utf8?(str) fail ArgumentError, 'str needs to be a String' unless str.is_a?(String) # Control characters return false if str =~ /[\x00-\x08\x0B-\x0C\x0E\x0F]/ str.encoding.to_s == 'UTF-8' && str.valid_encoding? end