class Tilia::VObject::StringUtil

Useful utilities for working with various strings.

Public Class Methods

convert_to_utf8(str) click to toggle source

This method tries its best to convert the input string to UTF-8.

Currently only ISO-5991-1 input and UTF-8 input is supported, but this may be expanded upon if we receive other examples.

@param [String] str

@return [String]

# File lib/tilia/v_object/string_util.rb, line 26
def self.convert_to_utf8(str)
  str = str.encode('UTF-8', guess_encoding(str))

  # Removing any control characters
  str.gsub(/(?:[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F])/, '')
end
guess_encoding(str) click to toggle source

Detects the encoding of a string

Currently only supports 'UTF-8', 'ISO-5991-1' and 'Windows-1252'.

@param [String] str @return [String] encoding

# File lib/tilia/v_object/string_util.rb, line 39
def self.guess_encoding(str)
  cd = CharDet.detect(str)

  # Best solution I could find ...
  if cd['confidence'] > 0.4 && cd['encoding'] =~ /(?:windows|iso)/i
    cd['encoding']
  else
    'UTF-8'
  end
end
mb_strcut(string, length) click to toggle source

Cuts the string after a certain bytelength

@param [String] string @param [Fixnum] length @return [String] cut string

# File lib/tilia/v_object/string_util.rb, line 55
def self.mb_strcut(string, length)
  return '' if string == ''

  string = string.clone
  tmp = ''
  while tmp.bytesize <= length
    tmp += string[0]
    string[0] = ''
  end

  # Last char was utf-8 multibyte
  if tmp.bytesize > length
    string[0] = tmp[-1] + string[0].to_s
    tmp[-1] = ''
  end
  tmp
end
utf8?(str) click to toggle source

Returns true or false depending on if a string is valid UTF-8.

@param [String] str

@return [Boolean]

# File lib/tilia/v_object/string_util.rb, line 10
def self.utf8?(str)
  fail ArgumentError, 'str needs to be a String' unless str.is_a?(String)
  # Control characters
  return false if str =~ /[\x00-\x08\x0B-\x0C\x0E\x0F]/

  str.encoding.to_s == 'UTF-8' && str.valid_encoding?
end