class HtmlKit::Document

HtmlKit::Document

Used for parsing urls which are `http` or `https` You can use this to validate a html document. This uses nokogiri internally. Warning: Currently doesn't support HTML5 tags

For example (from irb):

irb(main):001:0> require 'html_kit'
=> true
irb(main):002:0> doc = HtmlKit::Document.new('http://www.nokogiri.org/index.html')
=> #<HtmlKit::Document:0x007fbb5408cfe8 @url="http://www.nokogiri.org/index.html">
irb(main):003:0> doc.valid?
=> false

Public Class Methods

new(url) click to toggle source
# File lib/html_kit/document.rb, line 27
def initialize(url)
  @url = url.strip

  raise HtmlKit::Errors::InvalidUrlError unless supported_scheme?
end

Public Instance Methods

errors() click to toggle source
# File lib/html_kit/document.rb, line 37
def errors
  document.errors
end
html5?() click to toggle source
# File lib/html_kit/document.rb, line 41
def html5?
  document.internal_subset.html5_dtd?
end
valid?() click to toggle source
# File lib/html_kit/document.rb, line 33
def valid?
  errors.empty?
end

Private Instance Methods

document() click to toggle source
# File lib/html_kit/document.rb, line 51
def document
  Nokogiri::HTML(html_content)
end
html_content() click to toggle source
# File lib/html_kit/document.rb, line 55
def html_content
  Net::HTTP.get(URI.parse(@url))
end
supported_scheme?() click to toggle source
# File lib/html_kit/document.rb, line 47
def supported_scheme?
  !(@url =~ /^https:|^http:/).nil?
end