class Jekyll::Embed
The idea with this class is to find the best safe representation of a link. For a YouTube video it could be the sandboxed iframe. This loads the video and allows you to reproduce it while preventing YT to call home and send data about your users. But other social networks will try to take control of their containers by modifying the page. They resist sandboxing and don't work correctly. For them, we cleanup unwanted HTML tags such as <script>, and return the HTML, which you can style using CSS. Twitter does this.
Others are only available through OGP, so we retrieve the metadata and render a template, which you can provide in your own theme too.
We also try for microformats and we would look at Schema.org too but doesn't seem to be a gem for it yet.
If the URL doesn't provide anything at all we get the URL, title and date of last visit.
Isn't it nice that the corporations that requires us to use OEmbed
, OGP, Twitter Cards, Schema.org and other metadata, don't do use themselves?
Also we're going to use heavy caching so we don't hit rate limits or lose the representation if the service is down or the URL is removed. We may be tempted to store the resources locally (images, videos, audio) but we have to take into account that people have legitimate reasons to remove media from the Internet.
Constants
- A_ATTRIBUTES
- DEFAULT_CONFIG
The default referrer policy only sends the origin URL (not the full URL, only the protocol/scheme and domain part) if the remote URL is HTTPS.
@see {developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy}
The default sandbox restrictions only allow scripts in the context of the iframe and opening new tabs.
@see {developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe#attr-sandbox}
- IFRAME_ATTRIBUTES
Attributes to apply by HTMLElement
- IMAGE_ATTRIBUTES
- INCLUDE_FALLBACK
- INCLUDE_OGP
Templates
- MEDIA_ATTRIBUTES
Public Class Methods
@return [Jekyll::Embed::Cache]
# File lib/jekyll/embed.rb 217 def cache 218 @cache ||= Jekyll::Embed::Cache.new('Jekyll::Embed') 219 end
# File lib/jekyll/embed.rb 229 def cleanup(html, url) 230 # Add our own attributes 231 html.css('iframe').each do |iframe| 232 IFRAME_ATTRIBUTES.each do |attr| 233 iframe[attr] = value_for_attr(attr) 234 end 235 236 # Embedding itself require allow-same-origin 237 iframe['sandbox'] += allow_same_origin(url) 238 end 239 240 html.css('audio, video').each do |media| 241 MEDIA_ATTRIBUTES.each do |attr| 242 media[attr] = value_for_attr(attr) 243 end 244 245 media['src'] = UrlPrivacy.clean media['src'] 246 end 247 248 html.css('img').each do |img| 249 IMAGE_ATTRIBUTES.each do |attr| 250 img[attr] = value_for_attr(attr) 251 end 252 end 253 254 html.css('a').each do |a| 255 A_ATTRIBUTES.each do |attr| 256 a[attr] = value_for_attr(attr) 257 end 258 end 259 260 html.css('[src]').each do |element| 261 element['src'] = CGI.escapeHTML(UrlPrivacy.clean(CGI.unescapeHTML(element['src']))) 262 end 263 264 html.css('[href]').each do |element| 265 element['href'] = CGI.escapeHTML(UrlPrivacy.clean(CGI.unescapeHTML(element['href']))) 266 end 267 268 # Return the cleaned up HTML 269 html 270 end
@return [Hash]
# File lib/jekyll/embed.rb 146 def config 147 @config ||= Jekyll::Utils.deep_merge_hashes(DEFAULT_CONFIG, (site.config['embed'] || {})) 148 end
Render the URL as HTML
-
Try oembed for video and image
-
If rich oembed, cleanup
-
If OGP, render templates
-
Else, render fallback template
@param [String] URL @return [String] HTML
# File lib/jekyll/embed.rb 129 def embed(url) 130 url.strip! 131 132 # Quick check 133 raise URI::Error unless url.start_with? 'http' 134 135 # Just to verify the URL is valid 136 URI.parse url 137 138 oembed(url) || ogp(url) || fallback(url) 139 rescue URI::Error 140 Jekyll.logger.warn "#{url.inspect} is not a valid URL" 141 142 url 143 end
Try something
# File lib/jekyll/embed.rb 185 def fallback(url) 186 cache.getset(url) do 187 html = Nokogiri::HTML.fragment get(url).body 188 element = html.css('article').first 189 element ||= html.css('section').first 190 element ||= html.css('main').first 191 element ||= html.css('body').first 192 title = html.css('title').first 193 description = html.css('meta[name="description"]').first 194 195 context = info.dup 196 context[:registers][:page] = payload['page'] = { 197 'title' => text(title), 198 'description' => text(description), 199 'url' => url, 200 'image' => element&.css('img')&.first&.public_send(:[], 'src') 201 } 202 203 fallback_template.render! payload, context 204 end 205 rescue Faraday::Error, Nokogiri::SyntaxError 206 nil 207 end
@param [String] URL @return [Faraday::Response]
# File lib/jekyll/embed.rb 211 def get(url) 212 @get_cache ||= {} 213 @get_cache[url] ||= http_client.get url 214 end
@return [Faraday::Connection]
# File lib/jekyll/embed.rb 222 def http_client 223 @http_client ||= Faraday.new do |builder| 224 builder.use FaradayMiddleware::FollowRedirects 225 builder.use :http_cache, shared_cache: false, store: cache, serializer: Marshal 226 end 227 end
Try for OEmbed
@param [String] URL @return [String,NilClass] Sanitized HTML or nil
# File lib/jekyll/embed.rb 154 def oembed(url) 155 cache.getset(url) do 156 oembed = OEmbed::Providers.get url 157 158 # Prevent caching of nil? 159 raise OEmbed::Error unless oembed.respond_to? :html 160 161 # Cleanup. We don't allow running remote scripts locally, 162 # period. 163 cleanup(Loofah.fragment(oembed.html).scrub!(:prune), url).to_s 164 end 165 rescue OEmbed::Error 166 nil 167 end
Try for OGP. @param [String] URL @return [String,NilClass]
# File lib/jekyll/embed.rb 172 def ogp(url) 173 cache.getset(url) do 174 ogp = OGP::OpenGraph.new get(url).body 175 context = info.dup 176 context[:registers][:page] = payload['page'] = ogp.data 177 178 ogp_template.render! payload, context 179 end 180 rescue OGP::MalformedSourceError, OGP::MissingAttributeError, Faraday::Error 181 nil 182 end
# File lib/jekyll/embed.rb 86 def site 87 unless @site 88 raise Jekyll::Errors::InvalidConfigurationError, 89 "Site is missing, configure with `Jekyll::Embed.site = site`" 90 end 91 92 @site 93 end
This is an initializer of sorts
@param [Jekyll::Site] @return [Jekyll::Site]
# File lib/jekyll/embed.rb 99 def site=(site) 100 raise ArgumentError, "Site must be a Jekyll::Site" unless site.is_a? Jekyll::Site 101 102 @site = site 103 104 # Add the _includes dir so we can provide default templates that 105 # can be overriden locally or by the theme. 106 site.includes_load_paths << File.expand_path(File.join(__dir__, '..', '..', '_includes')) 107 # Since we're embedding, we're allowing iframes 108 Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2 << 'iframe' 109 110 # Other elements that are disallowed 111 config['scrub']&.each do |scrub| 112 Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2.delete(scrub) 113 end 114 115 payload['embed'] = config['attributes'] 116 117 site 118 end
# File lib/jekyll/embed.rb 272 def text(node) 273 node&.text&.tr("\n", '')&.tr("\r", '')&.strip&.squeeze(' ') 274 end
Private Class Methods
If the iframe comes from the same site, we can allow the same origin policy on the sandbox.
@param [String] URL @return [String]
# File lib/jekyll/embed.rb 310 def allow_same_origin(url) 311 unless site.config['url'] 312 Jekyll.logger.warn "Add url to _config.yml to determine if the site can embed itself" 313 return ' allow-same-origin' 314 end 315 316 @allow_same_origin ||= {} 317 @allow_same_origin[url] ||= url.start_with?(site.config['url']) ? '' : ' allow-same-origin' 318 end
# File lib/jekyll/embed.rb 278 def fallback_template 279 @fallback_template ||= site.liquid_renderer.file('fallback.html').parse(INCLUDE_FALLBACK) 280 end
# File lib/jekyll/embed.rb 286 def info 287 @info ||= { 288 registers: { site: site }, 289 strict_filters: site.config.dig('liquid', 'strict_filters'), 290 strict_variables: site.config.dig('liquid', 'strict_variables') 291 } 292 end
# File lib/jekyll/embed.rb 282 def ogp_template 283 @ogp_template ||= site.liquid_renderer.file('ogp.html').parse(INCLUDE_OGP) 284 end
Caches it because Jekyll::Site#site_payload returns a new object everytime.
@return [Jekyll::Drops::UnifiedPayloadDrop]
# File lib/jekyll/embed.rb 324 def payload 325 @payload ||= site.site_payload 326 end
@param [String] @return [String]
# File lib/jekyll/embed.rb 296 def value_for_attr(attr) 297 @value_for_attr ||= {} 298 @value_for_attr[attr] ||= 299 case (value = config.dig('attributes', attr)) 300 when String then value 301 when Array then value.join(' ') 302 end 303 end