module Typeset

Contains all of our typeset-related class methods. Mix this module into a class, or just call ‘Typeset#typset` directly

Constants

DefaultLigatures

List of ligatures to process by default

DefaultMethods

The default typesetting methods and their configuration. Add new methods here in whatever order makes sense.

DefaultOptions
Ligatures

Map of raw text sequences to unicode ligatures

Public Class Methods

apply_to_text_nodes(html, &func) click to toggle source

Parse an HTML fragment with Nokogiri and apply a function to all of the descendant text nodes

# File lib/typeset.rb, line 15
def self.apply_to_text_nodes(html, &func)
  doc = Nokogiri::HTML("<div id='rtypeset_internal'>#{html}</div>", nil,"UTF-8",Nokogiri::XML::ParseOptions::NOENT)
  doc.search('//text()').each do |node|
    old_content = node.content
    new_content = func.call(node.content.strip)
    if old_content =~ /^(\s+)/
      new_content = " #{new_content}"
    end
    if old_content =~ /(\s+)$/
      new_content = "#{new_content} "
    end
    node.replace(new_content)
  end
  content = doc.css("#rtypeset_internal")[0].children.map { |child| child.to_html }
  return content.join("")
end
hanging_punctuation(text, options) click to toggle source

Add push/pull spans for hanging punctuation to text.

# File lib/typeset/hanging_punctuation.rb, line 27
def self.hanging_punctuation(text, options)
  return text if text.length < 2

  aligns = "CcOoYTAVvWwY".split('')
  words = text.split(/\s+/)
  words.each_with_index do |word, i|
    [[aligns, false],
     [HangingPunctuation::SingleWidth, 'single'],
     [HangingPunctuation::DoubleWidth, 'double']].each do |pair|
      pair[0].each do |signal|
        if word[0] == signal
          words[i] = "#{HangingPunctuation.pull(pair[1], signal)}#{word.slice(1,word.length)}"

          if not words[i-1].nil?
            words[i-1] = "#{words[i-1]}#{HangingPunctuation.push(pair[1] ? pair[1] : signal)}"
          end
        end
      end
    end
  end

  return words.join(" ")
end
hyphenate(text, options) click to toggle source

Hyphenate text, inserting soft hyphenation markers. Specify the language for hyphenation by passing in an options block to your typeset call, e.g.:

Typeset.typeset("do hyphenation on this", {:language => "en_gb"})
# File lib/typeset/hyphenate.rb, line 8
def self.hyphenate(text, options)
  options[:language] ||= 'en_us'
  hyphen = Text::Hyphen.new(:language => options[:language], :left => 0, :right => 0)

  text = hyphen.visualise(text, "\u00AD")

  return text
end
ligatures(text, options) click to toggle source

Find and replace sequences of text with their unicode ligature equivalents. Override the set of ligatures to find by passing in a custom options hash, e.g.:

Typeset.typeset("flue", {:ligatures => ["fl", "ue"]})
# -> returns "flᵫ"
# File lib/typeset/ligatures.rb, line 21
def self.ligatures(text, options)
  options[:ligatures] ||= DefaultLigatures

  options[:ligatures].each do |ligature|
    text.gsub!(ligature, Ligatures[ligature])
  end

  return text
end
punctuation(text, options) click to toggle source

Make dashes, elipses, and start/end punctuation a little prettier.

# File lib/typeset/punctuation.rb, line 3
def self.punctuation(text, options)
  # Dashes
  text.gsub!('--', '–')
  text.gsub!(' – ', "\u2009–\u2009")

  # Elipses
  text.gsub!('...', '…')

  # Non-breaking space for start/end punctuation with spaces.
  start_punc = /([«¿¡\[\(]) /
  if text =~ start_punc
    text.gsub!(start_punc, "#{$1}&nbsp;")
  end
  end_punc = / ([\!\?:;\.,‽»\]\)])/
  if text =~ end_punc
    text.gsub!(end_punc,"&nbsp;#{$1}")
  end

  return text
end
quotes(text, options) click to toggle source

A poor-man’s Smarty Pants implementation. Converts single & double quotes, tick marks, backticks, and primes into prettier unicode equivalents.

# File lib/typeset/quotes.rb, line 4
def self.quotes(text, options)
  # Unencode encoded characters, so our regex mess below works
  text.gsub!('&#39;',"\'")
  text.gsub!('&quot;',"\"")

  if text =~ /(\W|^)"(\S+)/
    text.gsub!(/(\W|^)"(\S+)/, "#{$1}\u201c#{$2}") # beginning "
  end
  if text =~ /(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/
    text.gsub!(/(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/, "#{$1}\u201d#{$2}") # ending "
  end
  if text =~ /([^0-9])"/
    text.gsub!(/([^0-9])"/, "#{$1}\u201d") # remaining " at end of word
  end
  if text =~ /(\W|^)'(\S)/
    text.gsub!(/(\W|^)'(\S)/, "#{$1}\u2018#{$2}") # beginning '
  end
  if text =~ /([a-z])'([a-z])/i
    text.gsub!(/([a-z])'([a-z])/i, "#{$1}\u2019#{$2}") # conjunction's possession
  end
  if text =~ /((\u2018[^']*)|[a-z])'([^0-9]|$)/i
    text.gsub!(/((\u2018[^']*)|[a-z])'([^0-9]|$)/i, "#{$1}\u2019#{$3}") # ending '
  end
  if text =~ /(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i
    text.gsub!(/(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i, "\u2019#{$2}#{$3}") # abbrev. years like '93
  end
  if text =~ /(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i
    text.gsub!(/(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i, "#{$1}\u2019") # backwards apostrophe
  end
  text.gsub!(/'''/, "\u2034") # triple prime
  text.gsub!(/("|'')/, "\u2033") # double prime
  text.gsub!(/'/, "\u2032")

  # Allow escaped quotes
  text.gsub!('\\\“','\"')
  text.gsub!('\\\”','\"')
  text.gsub!('\\\’','\'')
  text.gsub!('\\\‘','\'')

  return text
end
small_caps(text, options) click to toggle source

Identify likely acronyms, and wrap them in a ‘small-caps’ span.

# File lib/typeset/small_caps.rb, line 3
def self.small_caps(text, options)
  words = text.split(" ")
  words.each_with_index do |word, i|
    if word =~ /^\W*([[:upper:]][[:upper:]][[:upper:]]+)\W*/
      leading,trailing = word.split($1)
      words[i] = "#{leading}<span class=\"small-caps\">#{$1}</span>#{trailing}"
    end
  end
  return words.map { |x| x.strip }.join(" ")
end
spaces(text, options) click to toggle source

Replace wide (normal) spaces around math operators with hair spaces.

# File lib/typeset/spaces.rb, line 3
def self.spaces(text, options)
  text.gsub!(" / ", "\u2009/\u2009")
  text.gsub!(" × ", "\u2009×\u2009")
  text.gsub!(" % ", "\u2009%\u2009")
  text.gsub!(" + ", "\u2009+\u2009")

  return text
end
typeset(html, options=Typeset::DefaultOptions) click to toggle source

The main entry point for Typeset. Pass in raw HTML or text, along with an optional options block.

# File lib/typeset.rb, line 51
def self.typeset(html, options=Typeset::DefaultOptions)
  methods = Typeset::DefaultMethods.dup
  options[:disable] ||= DefaultOptions[:disable]
  methods.reject! { |method| options[:disable].include?(method[0]) }

  methods.each do |func, use_text_nodes|
    new_html = html
    if use_text_nodes
      new_html = Typeset.apply_to_text_nodes(html) { |content| Typeset.send(func, content, options) }
    else
      new_html = Typeset.send(func, html, options).strip
    end
    html = new_html
  end
  return html
end