module Typeset
Contains all of our typeset-related class methods. Mix this module into a class, or just call ‘Typeset#typset` directly
Constants
- DefaultLigatures
List of ligatures to process by default
- DefaultMethods
The default typesetting methods and their configuration. Add new methods here in whatever order makes sense.
- DefaultOptions
- Ligatures
Map of raw text sequences to unicode ligatures
Public Class Methods
Parse an HTML fragment with Nokogiri and apply a function to all of the descendant text nodes
# File lib/typeset.rb, line 15 def self.apply_to_text_nodes(html, &func) doc = Nokogiri::HTML("<div id='rtypeset_internal'>#{html}</div>", nil,"UTF-8",Nokogiri::XML::ParseOptions::NOENT) doc.search('//text()').each do |node| old_content = node.content new_content = func.call(node.content.strip) if old_content =~ /^(\s+)/ new_content = " #{new_content}" end if old_content =~ /(\s+)$/ new_content = "#{new_content} " end node.replace(new_content) end content = doc.css("#rtypeset_internal")[0].children.map { |child| child.to_html } return content.join("") end
Add push/pull spans for hanging punctuation to text.
# File lib/typeset/hanging_punctuation.rb, line 27 def self.hanging_punctuation(text, options) return text if text.length < 2 aligns = "CcOoYTAVvWwY".split('') words = text.split(/\s+/) words.each_with_index do |word, i| [[aligns, false], [HangingPunctuation::SingleWidth, 'single'], [HangingPunctuation::DoubleWidth, 'double']].each do |pair| pair[0].each do |signal| if word[0] == signal words[i] = "#{HangingPunctuation.pull(pair[1], signal)}#{word.slice(1,word.length)}" if not words[i-1].nil? words[i-1] = "#{words[i-1]}#{HangingPunctuation.push(pair[1] ? pair[1] : signal)}" end end end end end return words.join(" ") end
Hyphenate text, inserting soft hyphenation markers. Specify the language for hyphenation by passing in an options block to your typeset call, e.g.:
Typeset.typeset("do hyphenation on this", {:language => "en_gb"})
# File lib/typeset/hyphenate.rb, line 8 def self.hyphenate(text, options) options[:language] ||= 'en_us' hyphen = Text::Hyphen.new(:language => options[:language], :left => 0, :right => 0) text = hyphen.visualise(text, "\u00AD") return text end
Find and replace sequences of text with their unicode ligature equivalents. Override the set of ligatures to find by passing in a custom options hash, e.g.:
Typeset.typeset("flue", {:ligatures => ["fl", "ue"]}) # -> returns "flᵫ"
# File lib/typeset/ligatures.rb, line 21 def self.ligatures(text, options) options[:ligatures] ||= DefaultLigatures options[:ligatures].each do |ligature| text.gsub!(ligature, Ligatures[ligature]) end return text end
Make dashes, elipses, and start/end punctuation a little prettier.
# File lib/typeset/punctuation.rb, line 3 def self.punctuation(text, options) # Dashes text.gsub!('--', '–') text.gsub!(' – ', "\u2009–\u2009") # Elipses text.gsub!('...', '…') # Non-breaking space for start/end punctuation with spaces. start_punc = /([«¿¡\[\(]) / if text =~ start_punc text.gsub!(start_punc, "#{$1} ") end end_punc = / ([\!\?:;\.,‽»\]\)])/ if text =~ end_punc text.gsub!(end_punc," #{$1}") end return text end
A poor-man’s Smarty Pants implementation. Converts single & double quotes, tick marks, backticks, and primes into prettier unicode equivalents.
# File lib/typeset/quotes.rb, line 4 def self.quotes(text, options) # Unencode encoded characters, so our regex mess below works text.gsub!(''',"\'") text.gsub!('"',"\"") if text =~ /(\W|^)"(\S+)/ text.gsub!(/(\W|^)"(\S+)/, "#{$1}\u201c#{$2}") # beginning " end if text =~ /(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/ text.gsub!(/(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/, "#{$1}\u201d#{$2}") # ending " end if text =~ /([^0-9])"/ text.gsub!(/([^0-9])"/, "#{$1}\u201d") # remaining " at end of word end if text =~ /(\W|^)'(\S)/ text.gsub!(/(\W|^)'(\S)/, "#{$1}\u2018#{$2}") # beginning ' end if text =~ /([a-z])'([a-z])/i text.gsub!(/([a-z])'([a-z])/i, "#{$1}\u2019#{$2}") # conjunction's possession end if text =~ /((\u2018[^']*)|[a-z])'([^0-9]|$)/i text.gsub!(/((\u2018[^']*)|[a-z])'([^0-9]|$)/i, "#{$1}\u2019#{$3}") # ending ' end if text =~ /(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i text.gsub!(/(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i, "\u2019#{$2}#{$3}") # abbrev. years like '93 end if text =~ /(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i text.gsub!(/(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i, "#{$1}\u2019") # backwards apostrophe end text.gsub!(/'''/, "\u2034") # triple prime text.gsub!(/("|'')/, "\u2033") # double prime text.gsub!(/'/, "\u2032") # Allow escaped quotes text.gsub!('\\\“','\"') text.gsub!('\\\”','\"') text.gsub!('\\\’','\'') text.gsub!('\\\‘','\'') return text end
Identify likely acronyms, and wrap them in a ‘small-caps’ span.
# File lib/typeset/small_caps.rb, line 3 def self.small_caps(text, options) words = text.split(" ") words.each_with_index do |word, i| if word =~ /^\W*([[:upper:]][[:upper:]][[:upper:]]+)\W*/ leading,trailing = word.split($1) words[i] = "#{leading}<span class=\"small-caps\">#{$1}</span>#{trailing}" end end return words.map { |x| x.strip }.join(" ") end
Replace wide (normal) spaces around math operators with hair spaces.
# File lib/typeset/spaces.rb, line 3 def self.spaces(text, options) text.gsub!(" / ", "\u2009/\u2009") text.gsub!(" × ", "\u2009×\u2009") text.gsub!(" % ", "\u2009%\u2009") text.gsub!(" + ", "\u2009+\u2009") return text end
The main entry point for Typeset
. Pass in raw HTML or text, along with an optional options block.
# File lib/typeset.rb, line 51 def self.typeset(html, options=Typeset::DefaultOptions) methods = Typeset::DefaultMethods.dup options[:disable] ||= DefaultOptions[:disable] methods.reject! { |method| options[:disable].include?(method[0]) } methods.each do |func, use_text_nodes| new_html = html if use_text_nodes new_html = Typeset.apply_to_text_nodes(html) { |content| Typeset.send(func, content, options) } else new_html = Typeset.send(func, html, options).strip end html = new_html end return html end