module URI
Public Class Methods
decode(str)
click to toggle source
Use it by default
# File lib/unicorn-cuba-base/uri_ext.rb, line 21 def self.decode(str) self.utf_decode(str) end
Also aliased as: pct_decode
utf_decode(str)
click to toggle source
From en.wikipedia.org/wiki/Percent-encoding: The generic URI
syntax mandates that new URI
schemes that provide for the representation of character data in a URI
must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. Also sometimes JavaScript encode() function (deprecated) is being used; this uses %uXXXX encoding for UTF-8 chars
# File lib/unicorn-cuba-base/uri_ext.rb, line 11 def self.utf_decode(str) pct_decode(str) # decode %XX bits .force_encoding('UTF-8') # Make sure the string is interpreting UTF-8 chars .tap{|uri| validate_string_encoding(uri)} .gsub(/%u([0-9a-z]{4})/) {|s| [$1.to_i(16)].pack("U")} # Decode %uXXXX encoded chars (JavaScript.encode()) .tap{|uri| validate_string_encoding(uri)} end
Private Class Methods
validate_string_encoding(uri)
click to toggle source
# File lib/unicorn-cuba-base/uri_ext.rb, line 27 def self.validate_string_encoding(uri) raise URI::InvalidURIError, "invalid UTF-8 encoding in URI: #{uri.inspect}" if not uri.valid_encoding? end