class RubyPants

RubyPants – SmartyPants ported to Ruby

Synopsis

RubyPants is a Ruby port of the smart-quotes library SmartyPants.

The original “SmartyPants” is a free web publishing plug-in for Movable Type, Blosxom, and BBEdit that easily translates plain ASCII punctuation characters into “smart” typographic punctuation HTML entities.

Description

RubyPants can perform the following transformations:

This means you can write, edit, and save your posts using plain old ASCII straight quotes, plain dashes, and plain dots, but your published posts (and final HTML output) will appear with smart quotes, em-dashes, and proper ellipses.

RubyPants does not modify characters within <pre>, <code>, <kbd>, <math>, style, or <script> tag blocks. Typically, these tags are used to display text where smart quotes and other “smart punctuation” would not be appropriate, such as source code or example markup.

Backslash Escapes

If you need to use literal straight quotes (or plain hyphens and periods), RubyPants accepts the following backslash escape sequences to force non-smart punctuation. It does so by transforming the escape sequence into a decimal-encoded HTML entity:

\\    \"    \'    \.    \-    \`

This is useful, for example, when you want to use straight quotes as foot and inch marks: 6’2“ tall; a 17” iMac. (Use 6\'2\" resp. 17\".)

Algorithmic Shortcomings

One situation in which quotes will get curled the wrong way is when apostrophes are used at the start of leading contractions. For example:

'Twas the night before Christmas.

In the case above, RubyPants will turn the apostrophe into an opening single-quote, when in fact it should be a closing one. I don’t think this problem can be solved in the general case–every word processor I’ve tried gets this wrong as well. In such cases, it’s best to use the proper HTML entity for closing single-quotes (“”) by hand.

Bugs

To file bug reports or feature requests (except see above) please send email to: chneukirchen@gmail.com

If the bug involves quotes being curled the wrong way, please send example text to illustrate.

Authors

John Gruber did all of the hard work of writing this software in Perl for Movable Type and almost all of this useful documentation. Chad Miller ported it to Python to use with Pyblosxom.

Christian Neukirchen provided the Ruby port, as a general-purpose library that follows the *Cloth API.

Copyright and License

SmartyPants license:

Copyright © 2003 John Gruber (daringfireball.net) All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

This software is provided by the copyright holders and contributors “as is” and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

RubyPants license

RubyPants is a derivative work of SmartyPants and smartypants.py.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

This software is provided by the copyright holders and contributors “as is” and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

John Gruber

daringfireball.net

SmartyPants

daringfireball.net/projects/smartypants

Chad Miller

web.chad.org

Christian Neukirchen

kronavita.de/chris

Constants

VERSION

Public Class Methods

new(string, options=[2]) click to toggle source

Create a new RubyPants instance with the text in string.

Allowed elements in the options array:

0

do nothing

1

enable all, using only em-dash shortcuts

2

enable all, using old school en- and em-dash shortcuts (default)

3

enable all, using inverted old school en and em-dash shortcuts

-1

stupefy (translate HTML entities to their ASCII-counterparts)

If you don’t like any of these defaults, you can pass symbols to change RubyPants’ behavior:

:quotes

quotes

:backticks

backtick quotes (“double” only)

:allbackticks

backtick quotes (“double” and ‘single’)

:dashes

dashes

:oldschool

old school dashes

:inverted

inverted old school dashes

:ellipses

ellipses

:convertquotes

convert &quot; entities to " for Dreamweaver users

:stupefy

translate RubyPants HTML entities to their ASCII counterparts.

Calls superclass method
    # File lib/rubypants-unicode/rubypants-unicode.rb
207 def initialize(string, options=[2])
208   super string
209   @options = [*options]
210 end

Public Instance Methods

to_html() click to toggle source

Apply SmartyPants transformations.

    # File lib/rubypants-unicode/rubypants-unicode.rb
213 def to_html
214   do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
215   convert_quotes = false
216 
217   if @options.include? 0
218     # Do nothing.
219     return self
220   elsif @options.include? 1
221     # Do everything, turn all options on.
222     do_quotes = do_backticks = do_ellipses = true
223     do_dashes = :normal
224   elsif @options.include? 2
225     # Do everything, turn all options on, use old school dash shorthand.
226     do_quotes = do_backticks = do_ellipses = true
227     do_dashes = :oldschool
228   elsif @options.include? 3
229     # Do everything, turn all options on, use inverted old school
230     # dash shorthand.
231     do_quotes = do_backticks = do_ellipses = true
232     do_dashes = :inverted
233   elsif @options.include?(-1)
234     do_stupefy = true
235   else
236     do_quotes =                @options.include? :quotes
237     do_backticks =             @options.include? :backticks
238     do_backticks = :both    if @options.include? :allbackticks
239     do_dashes = :normal     if @options.include? :dashes
240     do_dashes = :oldschool  if @options.include? :oldschool
241     do_dashes = :inverted   if @options.include? :inverted
242     do_ellipses =              @options.include? :ellipses
243     convert_quotes =           @options.include? :convertquotes
244     do_stupefy =               @options.include? :stupefy
245   end
246 
247   # Parse the HTML
248   tokens = tokenize
249   
250   # Keep track of when we're inside <pre> or <code> tags.
251   in_pre = false
252 
253   # Here is the result stored in.
254   result = ""
255 
256   # This is a cheat, used to get some context for one-character
257   # tokens that consist of just a quote char. What we do is remember
258   # the last character of the previous text token, to use as context
259   # to curl single- character quote tokens correctly.
260   prev_token_last_char = nil
261 
262   tokens.each { |token|
263     if token.first == :tag
264       result << token[1]
265       if token[1] =~ %r!<(/?)(?:pre|code|kbd|script|style|math)[\s>]!
266         in_pre = ($1 != "/")  # Opening or closing tag?
267       end
268     else
269       t = token[1]
270 
271       # Remember last char of this token before processing.
272       last_char = t[-1].chr
273 
274       unless in_pre
275         t = process_escapes t
276         
277         t.gsub!(/&quot;/, '"')  if convert_quotes
278 
279         if do_dashes
280           t = educate_dashes t            if do_dashes == :normal
281           t = educate_dashes_oldschool t  if do_dashes == :oldschool
282           t = educate_dashes_inverted t   if do_dashes == :inverted
283         end
284 
285         t = educate_ellipses t  if do_ellipses
286 
287         # Note: backticks need to be processed before quotes.
288         if do_backticks
289           t = educate_backticks t
290           t = educate_single_backticks t  if do_backticks == :both
291         end
292 
293         if do_quotes
294           if t == "'"
295             # Special case: single-character ' token
296             if prev_token_last_char =~ /\S/
297               t = "’"
298             else
299               t = "‘"
300             end
301           elsif t == '"'
302             # Special case: single-character " token
303             if prev_token_last_char =~ /\S/
304               t = "”"
305             else
306               t = "“"
307             end
308           else
309             # Normal case:
310             t = educate_quotes t
311           end
312         end
313 
314         t = stupefy_entities t  if do_stupefy
315       end
316 
317       prev_token_last_char = last_char
318       result << t
319     end
320   }
321 
322   # Done
323   result
324 end

Protected Instance Methods

educate_backticks(str) click to toggle source

Return the string, with “``backticks''”-style single quotes translated into HTML curly quote entities.

    # File lib/rubypants-unicode/rubypants-unicode.rb
384 def educate_backticks(str)
385   str.gsub("``", '“').gsub("''", '”')
386 end
educate_dashes(str) click to toggle source

The string, with each instance of “--” translated to an em-dash HTML entity.

    # File lib/rubypants-unicode/rubypants-unicode.rb
347 def educate_dashes(str)
348   str.gsub(/--(?!>)/, '—')
349 end
educate_dashes_inverted(str) click to toggle source

Return the string, with each instance of “--” translated to an em-dash HTML entity, and each “---” translated to an en-dash HTML entity. Two reasons why: First, unlike the en- and em-dash syntax supported by educate_dashes_oldschool, it’s compatible with existing entries written before SmartyPants 1.1, back when “--” was only used for em-dashes. Second, em-dashes are more common than en-dashes, and so it sort of makes sense that the shortcut should be shorter to type. (Thanks to Aaron Swartz for the idea.)

    # File lib/rubypants-unicode/rubypants-unicode.rb
369 def educate_dashes_inverted(str)
370   str.gsub(/---/, '–').gsub(/--(?!>)/, '—')
371 end
educate_dashes_oldschool(str) click to toggle source

The string, with each instance of “--” translated to an en-dash HTML entity, and each “---” translated to an em-dash HTML entity.

    # File lib/rubypants-unicode/rubypants-unicode.rb
355 def educate_dashes_oldschool(str)
356   str.gsub(/---/, '—').gsub(/--(?!>)/, '–')
357 end
educate_ellipses(str) click to toggle source

Return the string, with each instance of “...” translated to an ellipsis HTML entity. Also converts the case where there are spaces between the dots.

    # File lib/rubypants-unicode/rubypants-unicode.rb
377 def educate_ellipses(str)
378   str.gsub('...', '…').gsub('. . .', '…')
379 end
educate_quotes(str) click to toggle source

Return the string, with “educated” curly quote HTML entities.

    # File lib/rubypants-unicode/rubypants-unicode.rb
397 def educate_quotes(str)
398   punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
399 
400   str = str.dup
401     
402   # Special case if the very first character is a quote followed by
403   # punctuation at a non-word-break. Close the quotes by brute
404   # force:
405   str.gsub!(/^'(?=#{punct_class}\B)/, '’')
406   str.gsub!(/^"(?=#{punct_class}\B)/, '”')
407 
408   # Special case for double sets of quotes, e.g.:
409   #   <p>He said, "'Quoted' words in a larger quote."</p>
410   str.gsub!(/"'(?=\w)/, '“‘')
411   str.gsub!(/'"(?=\w)/, '‘“')
412 
413   # Special case for decade abbreviations (the '80s):
414   str.gsub!(/'(?=\d\ds)/, '’')
415 
416   close_class = %![^\ \t\r\n\\[\{\(\-]!
417   dec_dashes = '–|—'
418   
419   # Get most opening single quotes:
420   str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
421            '\1‘')
422   # Single closing quotes:
423   str.gsub!(/(#{close_class})'/, '\1’')
424   str.gsub!(/'(\s|s\b|$)/, '’\1')
425   # Any remaining single quotes should be opening ones:
426   str.gsub!(/'/, '‘')
427 
428   # Get most opening double quotes:
429   str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
430            '\1“')
431   # Double closing quotes:
432   str.gsub!(/(#{close_class})"/, '\1”')
433   str.gsub!(/"(\s|s\b|$)/, '”\1')
434   # Any remaining quotes should be opening ones:
435   str.gsub!(/"/, '“')
436 
437   str
438 end
educate_single_backticks(str) click to toggle source

Return the string, with “`backticks'”-style single quotes translated into HTML curly quote entities.

    # File lib/rubypants-unicode/rubypants-unicode.rb
391 def educate_single_backticks(str)
392   str.gsub("`", '‘').gsub("'", '’')
393 end
process_escapes(str) click to toggle source

Return the string, with after processing the following backslash escape sequences. This is useful if you want to force a “dumb” quote or other character to appear.

Escaped are:

\\    \"    \'    \.    \-    \`
    # File lib/rubypants-unicode/rubypants-unicode.rb
335 def process_escapes(str)
336   str.gsub('\\\\', '&#92;').
337     gsub('\"', '&#34;').
338     gsub("\\\'", '&#39;').
339     gsub('\.', '&#46;').
340     gsub('\-', '&#45;').
341     gsub('\`', '&#96;')
342 end
stupefy_entities(str) click to toggle source

Return the string, with each RubyPants HTML entity translated to its ASCII counterpart.

Note: This is not reversible (but exactly the same as in SmartyPants)

    # File lib/rubypants-unicode/rubypants-unicode.rb
445 def stupefy_entities(str)
446   str.
447     gsub(/–/, '-').      # en-dash
448     gsub(/—/, '--').     # em-dash
449     
450     gsub(/‘/, "'").      # open single quote
451     gsub(/’/, "'").      # close single quote
452     
453     gsub(/“/, '"').      # open double quote
454     gsub(/”/, '"').      # close double quote
455     
456     gsub(/…/, '...')     # ellipsis
457 end
tokenize() click to toggle source

Return an array of the tokens comprising the string. Each token is either a tag (possibly with nested, tags contained therein, such as <a href="<MTFoo>">, or a run of text between tags. Each element of the array is a two-element array; the first is either :tag or :text; the second is the actual value.

Based on the _tokenize() subroutine from Brad Choate’s MTRegex plugin. <www.bradchoate.com/past/mtregex.php>

This is actually the easier variant using tag_soup, as used by Chad Miller in the Python port of SmartyPants.

    # File lib/rubypants-unicode/rubypants-unicode.rb
471 def tokenize
472   tag_soup = /([^<]*)(<[^>]*>)/
473 
474   tokens = []
475 
476   prev_end = 0
477   scan(tag_soup) {
478     tokens << [:text, $1]  if $1 != ""
479     tokens << [:tag, $2]
480     
481     prev_end = $~.end(0)
482   }
483 
484   if prev_end < size
485     tokens << [:text, self[prev_end..-1]]
486   end
487 
488   tokens
489 end