class Jekyll::Embed

The idea with this class is to find the best safe representation of a link. For a YouTube video it could be the sandboxed iframe. This loads the video and allows you to reproduce it while preventing YT to call home and send data about your users. But other social networks will try to take control of their containers by modifying the page. They resist sandboxing and don't work correctly. For them, we cleanup unwanted HTML tags such as <script>, and return the HTML, which you can style using CSS. Twitter does this.

Others are only available through OGP, so we retrieve the metadata and render a template, which you can provide in your own theme too.

We also try for microformats and we would look at Schema.org too but doesn't seem to be a gem for it yet.

If the URL doesn't provide anything at all we get the URL, title and date of last visit.

Isn't it nice that the corporations that requires us to use OEmbed, OGP, Twitter Cards, Schema.org and other metadata, don't do use themselves?

Also we're going to use heavy caching so we don't hit rate limits or lose the representation if the service is down or the URL is removed. We may be tempted to store the resources locally (images, videos, audio) but we have to take into account that people have legitimate reasons to remove media from the Internet.

Constants

A_ATTRIBUTES
DEFAULT_CONFIG

The default referrer policy only sends the origin URL (not the full URL, only the protocol/scheme and domain part) if the remote URL is HTTPS.

@see {developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy}

The default sandbox restrictions only allow scripts in the context of the iframe and opening new tabs.

@see {developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe#attr-sandbox}

IFRAME_ATTRIBUTES

Attributes to apply by HTMLElement

IMAGE_ATTRIBUTES
INCLUDE_FALLBACK
INCLUDE_OGP

Templates

MEDIA_ATTRIBUTES

Public Class Methods

cache() click to toggle source

@return [Jekyll::Embed::Cache]

    # File lib/jekyll/embed.rb
217 def cache
218   @cache ||= Jekyll::Embed::Cache.new('Jekyll::Embed')
219 end
cleanup(html, url) click to toggle source
    # File lib/jekyll/embed.rb
229 def cleanup(html, url)
230   # Add our own attributes
231   html.css('iframe').each do |iframe|
232     IFRAME_ATTRIBUTES.each do |attr|
233       iframe[attr] = value_for_attr(attr)
234     end
235 
236     # Embedding itself require allow-same-origin
237     iframe['sandbox'] += allow_same_origin(url)
238   end
239 
240   html.css('audio, video').each do |media|
241     MEDIA_ATTRIBUTES.each do |attr|
242       media[attr] = value_for_attr(attr)
243     end
244 
245     media['src'] = UrlPrivacy.clean media['src']
246   end
247 
248   html.css('img').each do |img|
249     IMAGE_ATTRIBUTES.each do |attr|
250       img[attr] = value_for_attr(attr)
251     end
252   end
253 
254   html.css('a').each do |a|
255     A_ATTRIBUTES.each do |attr|
256       a[attr] = value_for_attr(attr)
257     end
258   end
259 
260   html.css('[src]').each do |element|
261     element['src'] = CGI.escapeHTML(UrlPrivacy.clean(CGI.unescapeHTML(element['src'])))
262   end
263 
264   html.css('[href]').each do |element|
265     element['href'] = CGI.escapeHTML(UrlPrivacy.clean(CGI.unescapeHTML(element['href'])))
266   end
267 
268   # Return the cleaned up HTML
269   html
270 end
config() click to toggle source

@return [Hash]

    # File lib/jekyll/embed.rb
146 def config
147   @config ||= Jekyll::Utils.deep_merge_hashes(DEFAULT_CONFIG, (site.config['embed'] || {}))
148 end
embed(url) click to toggle source

Render the URL as HTML

  1. Try oembed for video and image

  2. If rich oembed, cleanup

  3. If OGP, render templates

  4. Else, render fallback template

@param [String] URL @return [String] HTML

    # File lib/jekyll/embed.rb
129 def embed(url)
130   url.strip!
131 
132   # Quick check
133   raise URI::Error unless url.start_with? 'http'
134 
135   # Just to verify the URL is valid
136   URI.parse url
137 
138   oembed(url) || ogp(url) || fallback(url)
139 rescue URI::Error
140   Jekyll.logger.warn "#{url.inspect} is not a valid URL"
141 
142   url
143 end
fallback(url) click to toggle source

Try something

    # File lib/jekyll/embed.rb
185 def fallback(url)
186   cache.getset(url) do
187     html        = Nokogiri::HTML.fragment get(url).body
188     element     = html.css('article').first
189     element   ||= html.css('section').first
190     element   ||= html.css('main').first
191     element   ||= html.css('body').first
192     title       = html.css('title').first
193     description = html.css('meta[name="description"]').first
194 
195     context = info.dup
196     context[:registers][:page] = payload['page'] = {
197       'title' => text(title),
198       'description' => text(description),
199       'url' => url,
200       'image' => element&.css('img')&.first&.public_send(:[], 'src')
201     }
202 
203     fallback_template.render! payload, context
204   end
205 rescue Faraday::Error, Nokogiri::SyntaxError
206   nil
207 end
get(url) click to toggle source

@param [String] URL @return [Faraday::Response]

    # File lib/jekyll/embed.rb
211 def get(url)
212   @get_cache ||= {}
213   @get_cache[url] ||= http_client.get url
214 end
http_client() click to toggle source

@return [Faraday::Connection]

    # File lib/jekyll/embed.rb
222 def http_client
223   @http_client ||= Faraday.new do |builder|
224     builder.use FaradayMiddleware::FollowRedirects
225     builder.use :http_cache, shared_cache: false, store: cache, serializer: Marshal
226   end
227 end
oembed(url) click to toggle source

Try for OEmbed

@param [String] URL @return [String,NilClass] Sanitized HTML or nil

    # File lib/jekyll/embed.rb
154 def oembed(url)
155   cache.getset(url) do
156     oembed = OEmbed::Providers.get url
157 
158     # Prevent caching of nil?
159     raise OEmbed::Error unless oembed.respond_to? :html
160 
161     # Cleanup.  We don't allow running remote scripts locally,
162     # period.
163     cleanup(Loofah.fragment(oembed.html).scrub!(:prune), url).to_s
164   end
165 rescue OEmbed::Error
166   nil
167 end
ogp(url) click to toggle source

Try for OGP. @param [String] URL @return [String,NilClass]

    # File lib/jekyll/embed.rb
172 def ogp(url)
173   cache.getset(url) do
174     ogp = OGP::OpenGraph.new get(url).body
175     context = info.dup
176     context[:registers][:page] = payload['page'] = ogp.data
177 
178     ogp_template.render! payload, context
179   end
180 rescue OGP::MalformedSourceError, OGP::MissingAttributeError, Faraday::Error
181   nil
182 end
site() click to toggle source
   # File lib/jekyll/embed.rb
86 def site
87   unless @site
88     raise Jekyll::Errors::InvalidConfigurationError,
89       "Site is missing, configure with `Jekyll::Embed.site = site`"
90   end
91 
92   @site
93 end
site=(site) click to toggle source

This is an initializer of sorts

@param [Jekyll::Site] @return [Jekyll::Site]

    # File lib/jekyll/embed.rb
 99 def site=(site)
100   raise ArgumentError, "Site must be a Jekyll::Site" unless site.is_a? Jekyll::Site
101 
102   @site = site
103 
104   # Add the _includes dir so we can provide default templates that
105   # can be overriden locally or by the theme.
106   site.includes_load_paths << File.expand_path(File.join(__dir__, '..', '..', '_includes'))
107   # Since we're embedding, we're allowing iframes
108   Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2 << 'iframe'
109 
110   # Other elements that are disallowed
111   config['scrub']&.each do |scrub|
112     Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2.delete(scrub)
113   end
114 
115   payload['embed'] = config['attributes']
116 
117   site
118 end
text(node) click to toggle source
    # File lib/jekyll/embed.rb
272 def text(node)
273   node&.text&.tr("\n", '')&.tr("\r", '')&.strip&.squeeze(' ')
274 end

Private Class Methods

allow_same_origin(url) click to toggle source

If the iframe comes from the same site, we can allow the same origin policy on the sandbox.

@param [String] URL @return [String]

    # File lib/jekyll/embed.rb
310 def allow_same_origin(url)
311   unless site.config['url']
312     Jekyll.logger.warn "Add url to _config.yml to determine if the site can embed itself"
313     return ' allow-same-origin'
314   end
315 
316   @allow_same_origin ||= {}
317   @allow_same_origin[url] ||= url.start_with?(site.config['url']) ? '' : ' allow-same-origin'
318 end
fallback_template() click to toggle source
    # File lib/jekyll/embed.rb
278 def fallback_template
279   @fallback_template ||= site.liquid_renderer.file('fallback.html').parse(INCLUDE_FALLBACK)
280 end
info() click to toggle source
    # File lib/jekyll/embed.rb
286 def info
287   @info ||= {
288     registers: { site: site },
289     strict_filters: site.config.dig('liquid', 'strict_filters'),
290     strict_variables: site.config.dig('liquid', 'strict_variables')
291   }
292 end
ogp_template() click to toggle source
    # File lib/jekyll/embed.rb
282 def ogp_template
283   @ogp_template ||= site.liquid_renderer.file('ogp.html').parse(INCLUDE_OGP)
284 end
payload() click to toggle source

Caches it because Jekyll::Site#site_payload returns a new object everytime.

@return [Jekyll::Drops::UnifiedPayloadDrop]

    # File lib/jekyll/embed.rb
324 def payload
325   @payload ||= site.site_payload
326 end
value_for_attr(attr) click to toggle source

@param [String] @return [String]

    # File lib/jekyll/embed.rb
296 def value_for_attr(attr)
297   @value_for_attr ||= {}
298   @value_for_attr[attr] ||=
299     case (value = config.dig('attributes', attr))
300       when String then value
301       when Array then value.join(' ')
302     end
303 end