class SiteDiff::UriWrapper
SiteDiff
URI Wrapper.
Constants
- DEFAULT_CURL_OPTS
TODO: Move these CURL OPTS to
Config
.DEFAULT_CONFIG.
Public Class Methods
Canonicalize a path.
@param [String] path
A base relative path. Example: /foo/bar
# File lib/sitediff/uriwrapper.rb, line 193 def self.canonicalize(path) # Ignore trailing slashes for all paths except "/" (front page). path = path.chomp('/') unless path == '/' # If the path is empty, assume that it's the front page. path.empty? ? '/' : path end
Creates a UriWrapper
.
# File lib/sitediff/uriwrapper.rb, line 51 def initialize(uri, curl_opts = DEFAULT_CURL_OPTS, debug = true) @uri = uri.respond_to?(:scheme) ? uri : Addressable::URI.parse(uri) # remove trailing '/'s from local URIs @uri.path.gsub!(%r{/*$}, '') if local? @curl_opts = curl_opts @debug = debug end
Public Instance Methods
What does this one do?
FIXME: this is not used anymore
# File lib/sitediff/uriwrapper.rb, line 88 def +(other) # 'path' for SiteDiff includes (parts of) path, query, and fragment. sep = '' sep = '/' if local? || @uri.path.empty? self.class.new(@uri.to_s + sep + other) end
Returns the encoding of an HTTP response from headers , nil if not specified.
# File lib/sitediff/uriwrapper.rb, line 105 def charset_encoding(http_headers) if (content_type = http_headers['Content-Type']) if (md = /;\s*charset=([-\w]*)/.match(content_type)) md[1] end end end
Is this a local filesystem path?
# File lib/sitediff/uriwrapper.rb, line 82 def local? @uri.scheme.nil? end
Returns the “password” part of the URI.
# File lib/sitediff/uriwrapper.rb, line 67 def password @uri.password end
Queue reading this URL, with a completion handler to run after.
The handler should be callable as handler.
This method may choose not to queue the request at all, but simply execute right away.
# File lib/sitediff/uriwrapper.rb, line 180 def queue(hydra, &handler) if local? read_file(&handler) else hydra.queue(typhoeus_request(&handler)) end end
Reads a file and yields to the completion handler, see .queue()
# File lib/sitediff/uriwrapper.rb, line 97 def read_file File.open(@uri.to_s, 'r:UTF-8') { |f| yield ReadResult.new(f.read) } rescue Errno::ENOENT, Errno::ENOTDIR, Errno::EACCES, Errno::EISDIR => e yield ReadResult.error(e.message) end
Converts the URI to a string.
# File lib/sitediff/uriwrapper.rb, line 73 def to_s uri = @uri.dup uri.user = nil uri.password = nil uri.to_s end
Returns a Typhoeus::Request to fetch @uri
Completion callbacks of the request wrap the given handler which is assumed to accept a single ReadResult
argument.
# File lib/sitediff/uriwrapper.rb, line 117 def typhoeus_request params = @curl_opts.dup # Allow basic auth params[:userpwd] = @uri.user + ':' + @uri.password if @uri.user req = Typhoeus::Request.new(to_s, params) req.on_success do |resp| body = resp.body # Typhoeus does not respect HTTP headers when setting the encoding # resp.body; coerce if possible. if (encoding = charset_encoding(resp.headers)) body.force_encoding(encoding) end # Should be wrapped with rescue I guess? Maybe this entire function? # Should at least be an option in the Cli to disable this. # "stop on first error" begin yield ReadResult.new(body, encoding) rescue ArgumentError => e raise if @debug yield ReadResult.error( "Parsing error for #{@uri}: #{e.message}" ) rescue StandardError => e raise if @debug yield ReadResult.error( "Unknown parsing error for #{@uri}: #{e.message}" ) end end req.on_failure do |resp| if resp&.status_message msg = resp.status_message yield ReadResult.error( "HTTP error when loading #{@uri}: #{msg}", resp.response_code ) elsif (msg = resp.options[:return_code]) yield ReadResult.error( "Connection error when loading #{@uri}: #{msg}", resp.response_code ) else yield ReadResult.error( "Unknown error when loading #{@uri}: #{msg}", resp.response_code ) end end req end
Returns the “user” part of the URI.
# File lib/sitediff/uriwrapper.rb, line 61 def user @uri.user end