class HtmlKit::Document
HtmlKit::Document
¶ ↑
Used for parsing urls which are `http` or `https` You can use this to validate a html document. This uses nokogiri internally. Warning: Currently doesn't support HTML5 tags
For example (from irb):
irb(main):001:0> require 'html_kit' => true irb(main):002:0> doc = HtmlKit::Document.new('http://www.nokogiri.org/index.html') => #<HtmlKit::Document:0x007fbb5408cfe8 @url="http://www.nokogiri.org/index.html"> irb(main):003:0> doc.valid? => false
Public Class Methods
new(url)
click to toggle source
# File lib/html_kit/document.rb, line 27 def initialize(url) @url = url.strip raise HtmlKit::Errors::InvalidUrlError unless supported_scheme? end
Public Instance Methods
errors()
click to toggle source
# File lib/html_kit/document.rb, line 37 def errors document.errors end
html5?()
click to toggle source
# File lib/html_kit/document.rb, line 41 def html5? document.internal_subset.html5_dtd? end
valid?()
click to toggle source
# File lib/html_kit/document.rb, line 33 def valid? errors.empty? end
Private Instance Methods
document()
click to toggle source
# File lib/html_kit/document.rb, line 51 def document Nokogiri::HTML(html_content) end
html_content()
click to toggle source
# File lib/html_kit/document.rb, line 55 def html_content Net::HTTP.get(URI.parse(@url)) end
supported_scheme?()
click to toggle source
# File lib/html_kit/document.rb, line 47 def supported_scheme? !(@url =~ /^https:|^http:/).nil? end