module Web
Find the contact
Constants
- HTTP_REGEX
Captures http:// and https://
Attributes
Public Instance Methods
TODO: Sometimes DNS will do a redirect and not give a 404.
Need to prevent redirects.
Blindly tests to see if a url goes through. If there is a 404 error, this will return nil.
# File lib/gimme_poc/web.rb, line 93 def blind_test(url) LogMessages.blind_testing(url) get(url) end
Mechanize
needs absolute urls to work. If http:// or https:// isn't present, append http://.
# File lib/gimme_poc/web.rb, line 40 def format_url(str) LazyDomain.autohttp(str) end
Go to a page using Mechanize
. Sleep for a split second to not overload any servers.
Returns nil if bad url is given.
# File lib/gimme_poc/web.rb, line 13 def get(str) prepare_get_request(str) @page = @agent.get(@url) rescue Exception => e LogMessages.warn_err(e) end
Expects relative paths and merges everything. Returns a string. If there's nothing, return nil.
Add b word block to ensure whole word is searched.
# File lib/gimme_poc/web.rb, line 74 def link_with_href(str) merged_link(@page.link_with(href: /\b#{str}/).uri.to_s) rescue nil end
# File lib/gimme_poc/web.rb, line 27 def mech_setup @agent = Mechanize.new do |a| a.user_agent_alias = 'Mac Safari' a.open_timeout = 7 a.read_timeout = 7 a.idle_timeout = 7 a.redirect_ok = true end end
Used in case of relative paths. Merging guarantees correct url. This needs a url string as argument to work. Produces a merged uri string.
# File lib/gimme_poc/web.rb, line 65 def merged_link(url_str) @page.uri.merge(url_str).to_s end
Outputs domain of a url. Useful if subdomains are given to GimmePOC and they don't work.
For example: Given maps.google.com, returns 'google.com'.
# File lib/gimme_poc/web.rb, line 55 def orig_domain(str) LazyDomain.parse(str).domain rescue PublicSuffix::DomainInvalid => err LogMessages.invalid_domain(err) end
# File lib/gimme_poc/web.rb, line 20 def prepare_get_request(str) mech_setup @url = format_url(str) LogMessages.sending_get_request(url) sleep(0.1) end
Boolean, returns true if url is not identical to original domain.
In the event that the url has a path, this splits everything on forward slash and selects far left item.
# File lib/gimme_poc/web.rb, line 84 def subdomain?(str) (unformat_url(str).split('/')[0] != orig_domain(str)) end
Used for subdomain check. Not a permanent change to url variable.
# File lib/gimme_poc/web.rb, line 45 def unformat_url(str) str.gsub(HTTP_REGEX, '') end