module Scrapifier::Methods

Methods which will be included into the String class.

Public Instance Methods

find_uri(which = 0) click to toggle source

Find URIs in the String.

Example:

>> 'Wow! What an awesome site: http://adtangerine.com!'.find_uri
=> 'http://adtangerine.com'
>> 'Very cool: http://adtangerine.com and www.twitflink.com'.find_uri 1
=> 'www.twitflink.com'

Arguments:

which: (Integer)
  - Which URI in the String: first (0), second (1) and so on.
# File lib/scrapifier/methods.rb, line 53
def find_uri(which = 0)
  which = scan(sf_regex(:uri))[which.to_i][0]
  which =~ sf_regex(:protocol) ? which : "http://#{which}"
rescue NoMethodError
  nil
end
scrapify(options = {}) click to toggle source

Get metadata from an URI using the screen scraping technique.

Example:

>> 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
=> {
     :title => "AdTangerine | Advertising Platform for Social Media",
     :description => "AdTangerine is an advertising platform that...",
     :images => [
       "http://adtangerine.com/assets/logo_adt_og.png",
       "http://adtangerine.com/assets/logo_adt_og.png
     ],
     :uri => "http://adtangerine.com"
   }

Arguments:

options: (Hash)
  - which: (Integer)
      Which URI in the String will be used. It starts from 0 to N.
  - images: (Symbol or Array)
      Image extensions which are allowed to be returned as result.
# File lib/scrapifier/methods.rb, line 30
def scrapify(options = {})
  uri, meta = find_uri(options[:which]), {}
  return meta if uri.nil?

  if !(uri =~ sf_regex(:image))
    meta = sf_eval_uri(uri, options[:images])
  elsif !sf_check_img_ext(uri, options[:images]).empty?
    [:title, :description, :uri, :images].each { |k| meta[k] = uri }
  end

  meta
end