class NetworkUtils::UrlInfo

Simple class to get URL info (validation/existance, headers, content-type) Allows to get all this stuff without actually downloading huge files like CSVs, images, videos, etc.

Public Class Methods

new(url, request_timeout = 10) click to toggle source

Initialise a UrlInfo for a particular URL

@param [String] url the URL you want to get info about @param [Integer] request_timeout Max time to wait for headers from the server (seconds)

# File lib/network_utils/url_info.rb, line 29
def initialize(url, request_timeout = 10)
  @url = String.new(url.to_s).force_encoding('UTF-8')
  @request_timeout = request_timeout
end

Public Instance Methods

content_type() click to toggle source

A shortcut method to get the Content-Type of the remote resource

@return [String] remote resource Content-Type Header content

# File lib/network_utils/url_info.rb, line 72
def content_type
  headers&.fetch('content-type', nil)
         &.split(/,\s/)
         &.map { |ct| ct.split(/;\s/).first }
end
headers() click to toggle source

A method to get the remote resource HTTP headers Caches the result and returns memoised version

@return [Hash, nil] remote resource HTTP headers list or nil

# File lib/network_utils/url_info.rb, line 82
def headers
  return nil if @url.to_s.empty?
  return nil unless (encoded_url = encode(@url))

  Timeout.timeout(@request_timeout + CODE_TIMEOUT_EXTRA) do
    response = HTTParty.head(encoded_url, timeout: @request_timeout)
    raise response.response if response.response.is_a?(Net::HTTPServerError) ||
                               response.response.is_a?(Net::HTTPClientError)

    @headers ||= response.headers
  end
rescue SocketError, ThreadError, Errno::ENETUNREACH, Errno::ECONNREFUSED,
       Errno::EADDRNOTAVAIL, Timeout::Error, TypeError,
       Net::HTTPServerError, Net::HTTPClientError, Net::OpenTimeout
  nil
end
is?(type) click to toggle source

Check the Content-Type of the resource

@param [String, Symbol, Array] type the prefix (before “/”) or full Content-Type content @return [Boolean] true if Content-Type matches something from the types list

# File lib/network_utils/url_info.rb, line 38
def is?(type)
  return false if type.to_s.empty?

  expected_types = Array.wrap(type).map(&:to_s)
  content_type && expected_types.select do |t|
    content_type.select { |ct| ct.start_with?(t) }
  end.any?
end
size() click to toggle source

A shortcut method to get the remote resource size

@return [Integer] remote resource size (bytes), 0 if there's nothing

# File lib/network_utils/url_info.rb, line 65
def size
  headers&.fetch('content-length', 0).to_i
end
valid?() click to toggle source

Check offline URL validity

@return [Boolean] true if the URL is valid from the point of view of the standard

# File lib/network_utils/url_info.rb, line 50
def valid?
  @url.match?(UrlRegex.get(mode: :validation))
end
valid_online?() click to toggle source

Check online URL validity (& format validity as well)

@return [Boolean] true if the URL is valid from the point of view of the

standard & exists (has headers)
# File lib/network_utils/url_info.rb, line 58
def valid_online?
  valid? && headers
end

Private Instance Methods

encode(url) click to toggle source
# File lib/network_utils/url_info.rb, line 101
def encode(url)
  Addressable::URI.encode(url)
end