module Sitemaps::Fetcher
Simple single purpose HTTP client. Uses `Net::HTTP` directly, so as to not incur dependencies.
Public Class Methods
fetch(uri)
click to toggle source
Fetch the given URI.
Handles redirects (up to 10 times), and additionally will inflate a body delivered without a content-encoding header, but with a `.gz` as the end of the path.
@param uri [String, URI] the URI to fetch. @return [String] @raise [FetchError] if the server responds with an HTTP status that's not 2xx. @raise [MaxRedirectError] if more than 10 redirects have occurred while attempting to fetch the resource.
# File lib/sitemaps/fetcher.rb, line 18 def self.fetch(uri) attempts = 0 # we only work on URI objects unless uri.is_a? URI uri = "http://#{uri}" unless uri =~ %r{^https?://} uri = URI.parse(uri) end until attempts >= @max_attempts resp = Net::HTTP.get_response(uri) # on a good 2xx response, return the body if resp.code.to_s =~ /2\d\d/ if resp.header["Content-Encoding"].blank? && uri.path =~ /\.gz$/ return Zlib::GzipReader.new(StringIO.new(resp.body)).read else return resp.body end # on a 3xx response, handle the redirect elsif resp.code.to_s =~ /3\d\d/ location = URI.parse(resp.header['location']) location = uri + resp.header['location'] if location.relative? uri = location attempts += 1 next # otherwise (4xx, 5xx) throw an exception else raise FetchError, "Failed to fetch URI, #{uri}, failed with response code: #{resp.code}" end end # if we got here, we ran out of attempts raise MaxRedirectError, "Failed to fetch URI #{uri}, redirected too many times" if attempts >= @max_attempts end