class AoBane::Parser
Constants
- AtxHeaderRegexp
Regexp for matching ATX-style headers
- AutoAnchorEmailRegexp
- AutoAnchorURLRegexp
AoBane
change:allow loosely urls and addresses (BlueCloth is very strict)
loose examples:
<skype:tetra-dice> (other protocol) <ema+il@example.com> (ex: gmail alias)
not adapted addresses:
<"Abc@def"@example.com> (refer to quoted-string of RFC 5321)
- BlockQuoteRegexp
Pattern for matching Markdown blockquote blocks
- BoldRegexp
Pattern to match strong emphasis in Markdown text
- CaptionRegexp
- CodeBlockRegexp
Pattern for matching codeblocks
- CodeEscapeRegexp
Regexp to match special characters in a code block
- DDLineRegexp
- DefinitionListRegexp
- EmptyElementSuffix
The tag-closing string – set to '>' for HTML
- Encoders
Encoder functions to turn characters of an email address into encoded entities.
- EscapeTable
Table of MD5 sums for escaped characters
- FencedCodeBlockRegexp
- FootnoteDefinitionRegexp
Footnotes defs are in the form: [^id]: footnote contents.
- FootnoteIdRegexp
- HTMLCommentRegexp
Matching constructs for tokenizing X/HTML
- HTMLTagCloseRegexp
- HTMLTagOpenRegexp
- HTMLTagPart
- HeaderRegexp
- HruleBlockRegexp
Special case for <hr />.
- IdRegexp
- InlineImageRegexp
Next, handle inline images:  Don't forget: encode * and _
- InlineLinkRegexp
- ItalicRegexp
Pattern to match normal emphasis in Markdown text
- LinkRegexp
Link defs are in the form: ^[id]: url “optional title”
- ListItemRegexp
Pattern for transforming list items
- ListMarkerAny
- ListMarkerOl
Patterns to match and transform lists
- ListMarkerUl
- ListRegexp
- LooseBlockRegexp
More-liberal block-matching
- LooseBlockTags
- LooseTagPattern
- MetaTag
- PreChunk
- RefLinkIdRegexp
Pattern to match the linkid part of an anchor tag for reference-style links.
- ReferenceImageRegexp
Reference-style images
- SetextHeaderRegexp
Regexp for matching Setext-style headers
- StrictBlockRegexp
Nested blocks:
<div> <div> tags for inner block must be indented. </div> </div>
- StrictBlockTags
The list of tags which are considered block-level constructs and an alternation pattern suitable for use in regexps made from the list
- StrictTagPattern
- TOCRegexp
- TOCStartLevelRegexp
- TabWidth
Tab width for detab! if none is specified
- TableRegexp
- TableSeparatorCellRegexp
- XMLProcInstRegexp
Attributes
AoBane
Extension: display warnings on the top of output html (default: true)
Filters for controlling what gets output for untrusted input. (But really, you're filtering bad stuff out of untrusted input at submission-time via untainting, aren't you?)
Filters for controlling what gets output for untrusted input. (But really, you're filtering bad stuff out of untrusted input at submission-time via untainting, aren't you?)
RedCloth-compatibility accessor. Line-folding is part of Markdown syntax, so this isn't used by anything.
AoBane
Extension: add id to each header, for toc and anchors. (default: true)
Public Class Methods
Create a new AoBane
parser.
# File lib/AoBane.rb, line 460 def initialize(*restrictions) @log = Logger::new( $deferr ) @log.level = $DEBUG ? Logger::DEBUG : ($VERBOSE ? Logger::INFO : Logger::WARN) @scanner = nil # Add any restrictions, and set the line-folding attribute to reflect # what happens by default. @filter_html = nil @filter_styles = nil restrictions.flatten.each {|r| __send__("#{r}=", true) } @fold_lines = true @use_header_id = true @display_warnings = true @log.debug "String is: %p" % self end
Public Instance Methods
Do block-level transforms on a copy of str
using the specified render state rs
and return the results.
# File lib/AoBane.rb, line 820 def apply_block_transforms( str, rs ) rs.block_transform_depth += 1 # Port: This was called '_runBlockGamut' in the original @log.debug "Applying block transforms to:\n %p" % str text = str text = pretransform_fenced_code_blocks( text, rs ) text = pretransform_block_separators(text, rs) text = transform_headers( text, rs ) text = transform_toc(text, rs) text = transform_hrules( text, rs ) text = transform_lists( text, rs ) text = transform_definition_lists( text, rs ) # AoBane Extension text = transform_code_blocks( text, rs ) text = transform_block_quotes( text, rs ) text = transform_tables(text, rs) text = hide_html_blocks( text, rs ) text = form_paragraphs( text, rs ) rs.block_transform_depth -= 1 @log.debug "Done with block transforms:\n %p" % text return text end
Apply Markdown span transforms to a copy of the specified str
with the given render state rs
and return it.
# File lib/AoBane.rb, line 852 def apply_span_transforms( str, rs ) @log.debug "Applying span transforms to:\n %p" % str str = transform_code_spans( str, rs ) str = transform_auto_links( str, rs ) str = encode_html( str ) str = transform_images( str, rs ) str = transform_anchors( str, rs ) str = transform_italic_and_bold( str, rs ) # Hard breaks str.gsub!( / {2,}\n/, "<br#{EmptyElementSuffix}\n" ) @log.debug "Done with span transforms:\n %p" % str return str end
Convert tabs in str
to spaces. (this method is reformed to function-like method from original BlueCloth)
# File lib/AoBane.rb, line 805 def detab( str, tabwidth=TabWidth ) re = str.split( /\n/ ).collect {|line| line.gsub( /(.*?)\t/ ) do $1 + ' ' * (tabwidth - $1.length % tabwidth) end }.join("\n") re end
# File lib/AoBane.rb, line 709 def document_to_html(doc) rs = RenderState.new if doc.numbering? then rs.numbering = true end rs.numbering_start_level = doc.numbering_start_level rs.header_id_type = doc.header_id_type body_html = nil if doc.encoding_type then Util.change_kcode(doc.kcode){ body_html = parse_text(doc.body, rs) } else body_html = parse_text(doc.body, rs) end out = Util.generate_blank_string_io(doc.body) # XHTML decleration out.puts %Q|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">| # html start out.puts %Q|<html>| # head out.puts %Q|<head>| if doc.encoding_type and (charset = EncodingType.convert_to_charset(doc.encoding_type)) then out.puts %Q|<meta http-equiv="Content-Type" content="text/html; charset=#{charset}" />| end h1 = rs.headers.find{|x| x.level == 1} h1_content = (h1 ? h1.content : nil) title = Util.escape_html(doc.title || h1_content || 'no title (Generated by AoBane)') out.puts %Q|<title>#{title}</title>| %w(description keywords).each do |name| if doc[name] then content = Util.escape_html(doc[name]) out.puts %Q|<meta name="#{name}" content="#{content}" />| end end if doc['css'] then href = Util.escape_html(doc.css) out.puts %Q|<link rel="stylesheet" type="text/css" href="#{href}" />| end if doc['rdf-feed'] then href = Util.escape_html(doc['rdf-feed']) out.puts %Q|<link rel="alternate" type="application/rdf+xml" href="#{href}" />| end if doc['rss-feed'] then href = Util.escape_html(doc['rss-feed']) out.puts %Q|<link rel="alternate" type="application/rss+xml" href="#{href}" />| end if doc['atom-feed'] then href = Util.escape_html(doc['atom-feed']) out.puts %Q|<link rel="alternate" type="application/atom+xml" href="#{href}" />| end out.puts %Q|</head>| # body out.puts %Q|<body>| out.puts out.puts body_html out.puts out.puts %Q|</body>| # html end out.puts %Q|</html>| return out.string end
Return a copy of the given str
with any backslashed special character in it replaced with MD5 placeholders.
# File lib/AoBane.rb, line 1080 def encode_backslash_escapes( str ) # Make a copy with any double-escaped backslashes encoded text = str.gsub( /\\\\/, EscapeTable['\\\\'][:md5] ) EscapeTable.each_pair {|char, esc| next if char == '\\\\' next unless char =~ /\\./ text.gsub!( esc[:re], esc[:md5] ) } return text end
Escape any characters special to HTML and encode any characters special to Markdown in a copy of the given str
and return it.
# File lib/AoBane.rb, line 2128 def encode_code( str, rs ) #str.gsub( %r{&}, '&' ). #gsub( %r{<}, '<' ). #gsub( %r{>}, '>' ). #gsub( CodeEscapeRegexp ) {|match| EscapeTable[match][:md5]} return str end
Transform a copy of the given email addr
into an escaped version safer for posting publicly.
# File lib/AoBane.rb, line 1629 def encode_email_address( addr ) rval = '' ("mailto:" + addr).each_byte {|b| case b when ?: rval += ":" when ?@ rval += Encoders[ rand(2) ][ b ] else r = rand(100) rval += ( r > 90 ? Encoders[2][ b ] : r < 45 ? Encoders[1][ b ] : Encoders[0][ b ] ) end } return %{<a href="%s">%s</a>} % [ rval, rval.sub(/.+?:/, '') ] end
Return a copy of str
with angle brackets and ampersands HTML-encoded.
# File lib/AoBane.rb, line 2233 def encode_html( str ) #str.gsub( /&(?!#?[x]?(?:[0-9a-f]+|\w+);)/i, "&" ). #gsub( %r{<(?![a-z/?\$!])}i, "<" ) return str end
Escape any markdown characters in a copy of the given str
and return it.
# File lib/AoBane.rb, line 2146 def escape_md( str ) str. gsub( /\*|_/ ){|symbol| EscapeTable[symbol][:md5]} end
Escape special characters in the given str
# File lib/AoBane.rb, line 1036 def escape_special_chars( str ) @log.debug " Escaping special characters" text = '' # The original Markdown source has something called '$tags_to_skip' # declared here, but it's never used, so I don't define it. tokenize_html( str ) {|token, str| @log.debug " Adding %p token %p" % [ token, str ] case token # Within tags, encode * and _ when :tag text += str. gsub( /\*/, EscapeTable['*'][:md5] ). gsub( /_/, EscapeTable['_'][:md5] ) # Encode backslashed stuff in regular text when :text text += encode_backslash_escapes( str ) else raise TypeError, "Unknown token type %p" % token end } @log.debug " Text with escapes is now: %p" % text return text end
# File lib/AoBane.rb, line 2136 def escape_to_header_id(str) URI.escape(escape_md(str.gsub(/<\/?[^>]*>/, "").gsub(/\s/, "_")).gsub("/", ".2F")).gsub("%", ".") end
Wrap all remaining paragraph-looking text in a copy of str
inside <p> tags and return it.
# File lib/AoBane.rb, line 1793 def form_paragraphs( str, rs ) @log.debug " Forming paragraphs" grafs = str. sub( /\A\n+/, '' ). sub( /\n+\z/, '' ). split( /\n{2,}/ ) rval = grafs.collect {|graf| # Unhashify HTML blocks if this is a placeholder if rs.html_blocks.key?( graf ) rs.html_blocks[ graf ] # no output if this is block separater elsif graf == '~' then '' # Otherwise, wrap in <p> tags else apply_span_transforms(graf, rs). sub( /^[ ]*/, '<p>' ) + '</p>' end }.join( "\n\n" ) @log.debug " Formed paragraphs: %p" % rval return rval end
Replace all blocks of HTML in str
that start in the left margin with tokens.
# File lib/AoBane.rb, line 925 def hide_html_blocks( str, rs ) @log.debug "Hiding HTML blocks in %p" % str # Tokenizer proc to pass to gsub tokenize = lambda {|match| key = Digest::MD5::hexdigest( match ) rs.html_blocks[ key ] = match @log.debug "Replacing %p with %p" % [ match, key ] "\n\n#{key}\n\n" } rval = str.dup @log.debug "Finding blocks with the strict regex..." rval.gsub!( StrictBlockRegexp, &tokenize ) @log.debug "Finding blocks with the loose regex..." rval.gsub!( LooseBlockRegexp, &tokenize ) @log.debug "Finding hrules..." rval.gsub!( HruleBlockRegexp ) {|match| $1 + tokenize[$2] } return rval end
# File lib/AoBane.rb, line 2246 def indent(str) str.gsub( /^/, ' ' * TabWidth) end
Return one level of line-leading tabs or spaces from a copy of str
and return it.
# File lib/AoBane.rb, line 2242 def outdent( str ) str.gsub( /^(\t|[ ]{1,#{TabWidth}})/, '') end
# File lib/AoBane.rb, line 693 def parse_document(source, default_enc = EncodingType::UTF8) doc = Document.parse(source, default_enc) return document_to_html(doc) end
# File lib/AoBane.rb, line 699 def parse_document_file(path, default_enc = EncodingType::UTF8) doc = nil open(path){|f| doc = Document.parse_io(f, default_enc) } return document_to_html(doc) end
Render Markdown-formatted text in this string object as HTML and return it. The parameter is for compatibility with RedCloth, and is currently unused, though that may change in the future.
# File lib/AoBane.rb, line 506 def parse_text(source, rs = nil) rs ||= RenderState.new # check case rs.header_id_type when HeaderIDType::MD5, HeaderIDType::ESCAPE else rs.warnings << "illegal header id type - #{rs.header_id_type}" end # Create a StringScanner we can reuse for various lexing tasks @scanner = StringScanner::new( '' ) # Make a copy of the string with normalized line endings, tabs turned to # spaces, and a couple of guaranteed newlines at the end text = detab(source.gsub( /\r\n?/, "\n" )) text += "\n\n" @log.debug "Normalized line-endings: %p" % text #text = Utilities::prePaling(text) #Insert by set.minami 2013-04-27 #Insert by set.minami 2013-04-03 text = transform_block_quotes(text, rs) nrange = [] departure = 1 preproc = Marshal.load(Marshal.dump(text)) text.clear stack = [] html_text_number = 0 # Utilities::initNumberStack preproc.lines { |line| html_text_number += 1 begin line.gsub!(/^\{nrange:(.*?)(;\d+)??\}/){ |match| #depNum = $2.delete(';').to_i #departure = if depNum > 0 then depNum else 1 end if /h(\d)\-h(\d)/i =~ $1 nrange.push($1) nrange.push($2) if nrange.size > 2 then nrange.pop nrange.pop raise "Syntax Error!" end end next } @log.debug line #calculate numbering range = nrange[1].to_i - nrange[0].to_i if range == 0 then range = 1 end if range < 0 then p "AoBane Syntax Error:Header range is WRONG!" + "@ l.#{html_text_number}";exit(-1) raise FatalError,"AoBane Syntax Error:Header range is WRONG!" end if line =~ /^(%{1,#{range}})(.*?)\n/ then text << Utilities. calcSectionNo(nrange.min,range,$1.size,departure,$2,stack) + "\n" else text << line end @log.debug nrange.minmax rescue => e @log.warn "AoBane Syntax WARNING l.#{html_text_number}:#{line.chomp} haven't adopted" @log.warn e end } text.gsub!(/\*\[(.*?)\]\((.*?)(\|.*?)*(\/.*?)*\)/){|match| '<font color="' + if $2.nil? then '' else $2 end +'" ' + 'face="' + if $3.nil? then '' else $3.delete('|') end + '" ' + 'size="' + if $4.nil? then '' else $4.delete('/') end + '">' + $1 + '</font>' } #Insert by set.minami 2013-04-21 text = Utilities::abbrPreProcess(text) #Insert by set.minami 2013-04-01 text.gsub!(/\\TeX\{(.+?)\\TeX\}/){ begin $1.to_mathml rescue => e puts 'math_ml Error: ' + $1 puts e end } text = Utilities::preProcFence(text,0).join("\n") #Insert by set.minami 2013-04-27 #Insert by set.minami 2013-03-30 #Insert by set.minami # Filter HTML if we're asked to do so if self.filter_html #text.gsub!( "<", "<" ) #text.gsub!( ">", ">" ) @log.debug "Filtered HTML: %p" % text end # Simplify blank lines text.gsub!( /^ +$/, '' ) @log.debug "Tabs -> spaces/blank lines stripped: %p" % text # Replace HTML blocks with placeholders text = hide_html_blocks( text, rs ) @log.debug "Hid HTML blocks: %p" % text @log.debug "Render state: %p" % rs # Strip footnote definitions, store in render state text = strip_footnote_definitions( text, rs ) @log.debug "Stripped footnote definitions: %p" % text @log.debug "Render state: %p" % rs # Strip link definitions, store in render state text = strip_link_definitions( text, rs ) @log.debug "Stripped link definitions: %p" % text @log.debug "Render state: %p" % rs # Escape meta-characters text = escape_special_chars( text ) @log.debug "Escaped special characters: %p" % text # Transform block-level constructs text = apply_block_transforms( text, rs ) @log.debug "After block-level transforms: %p" % text # Now swap back in all the escaped characters text = unescape_special_chars( text ) @log.debug "After unescaping special characters: %p" % text # Extend footnotes unless rs.footnotes.empty? then text << %Q|<div class="footnotes"><hr#{EmptyElementSuffix}\n<ol>\n| rs.found_footnote_ids.each do |id| content = rs.footnotes[id] html = apply_block_transforms(content.sub(/\n+\Z/, '') + %Q| <a href="#footnote-ref:#{id}" rev="footnote">↩</a>|, rs) text << %Q|<li id="footnote:#{id}">\n#{html}\n</li>| end text << %Q|</ol>\n</div>\n| end # Display warnings if @display_warnings then unless rs.warnings.empty? then html = %Q|<pre><strong>[WARNINGS]\n| html << rs.warnings.map{|x| Util.escape_html(x)}.join("\n") html << %Q|</strong></pre>| text = html + text end end #Insert by set.minami 2013-04-21 text = Utilities::abbrPostProcess(text) #Insert by set.minami 2013-03-30 text = Utilities::insertTimeStamp(text) text = Utilities::postProcFence(text) #Insert by set.minami 2013-04-27 text = Utilities::transformSpecialChar(text) #Insert by set.minami 2013-04-27 return text end
# File lib/AoBane.rb, line 686 def parse_text_file(path) parse_text(File.read(path)) end
return values are extended. (mainly for testing)
# File lib/AoBane.rb, line 679 def parse_text_with_render_state(str, rs = nil) rs ||= RenderState.new html = parse_text(str, rs) return [html, rs] end
# File lib/AoBane.rb, line 1094 def pretransform_block_separators(str, rs) str.gsub(/^[ ]{0,#{TabWidth - 1}}[~][ ]*\n/){ "\n~\n\n" } end
# File lib/AoBane.rb, line 1549 def pretransform_fenced_code_blocks( str, rs ) @log.debug " Transforming fenced code blocks => standard code blocks" str.gsub( FencedCodeBlockRegexp ) {|block| "\n~\n\n" + indent($2) + "\n~\n\n" } end
# File lib/AoBane.rb, line 1009 def strip_footnote_definitions(str, rs) str.gsub( FootnoteDefinitionRegexp ) {|match| id = $1; content1 = $2; content2 = $3 unless id =~ FootnoteIdRegexp then rs.warnings << "illegal footnote id - #{id} (legal chars: a-zA-Z0-9_-.:)" end if content2 then @log.debug " Stripping multi-line definition %p, %p" % [$2, $3] content = content1 + "\n" + outdent(content2.chomp) @log.debug " Stripped multi-line definition %p, %p" % [id, content] rs.footnotes[id] = content else content = content1 || '' @log.debug " Stripped single-line definition %p, %p" % [id, content] rs.footnotes[id] = content end "" } end
Strip link definitions from str
, storing them in the given RenderState
rs
.
# File lib/AoBane.rb, line 974 def strip_link_definitions( str, rs ) str.gsub( LinkRegexp ) {|match| id, url, title = $1, $2, $3 rs.urls[ id.downcase ] = encode_html( url ) unless title.nil? rs.titles[ id.downcase ] = title.gsub( /"/, """ ) end "" } end
Break the HTML source in str
into a series of tokens and return them. The tokens are just 2-element Array tuples with a type and the actual content. If this function is called with a block, the type and text parts of each token will be yielded to it one at a time as they are extracted.
# File lib/AoBane.rb, line 2166 def tokenize_html( str ) depth = 0 tokens = [] @scanner.string = str.dup type, token = nil, nil until @scanner.empty? @log.debug "Scanning from %p" % @scanner.rest # Match comments and PIs without nesting if (( token = @scanner.scan(MetaTag) )) type = :tag # Do nested matching for HTML tags elsif (( token = @scanner.scan(HTMLTagOpenRegexp) )) tagstart = @scanner.pos @log.debug " Found the start of a plain tag at %d" % tagstart # Start the token with the opening angle depth = 1 type = :tag # Scan the rest of the tag, allowing unlimited nested <>s. If # the scanner runs out of text before the tag is closed, raise # an error. while depth.nonzero? # Scan either an opener or a closer chunk = @scanner.scan( HTMLTagPart ) or break # AoBane Fix (refer to spec/code-block.rb) @log.debug " Found another part of the tag at depth %d: %p" % [ depth, chunk ] token += chunk # If the last character of the token so far is a closing # angle bracket, decrement the depth. Otherwise increment # it for a nested tag. depth += ( token[-1, 1] == '>' ? -1 : 1 ) @log.debug " Depth is now #{depth}" end # Match text segments else @log.debug " Looking for a chunk of text" type = :text # Scan forward, always matching at least one character to move # the pointer beyond any non-tag '<'. token = @scanner.scan_until( /[^<]+/m ) end @log.debug " type: %p, token: %p" % [ type, token ] # If a block is given, feed it one token at a time. Add the token to # the token list to be returned regardless. if block_given? yield( type, token ) end tokens << [ type, token ] end return tokens end
Apply Markdown anchor transforms to a copy of the specified str
with the given render state rs
and return it.
# File lib/AoBane.rb, line 1847 def transform_anchors( str, rs ) @log.debug " Transforming anchors" @scanner.string = str.dup text = '' # Scan the whole string until @scanner.empty? if @scanner.scan( /\[/ ) link = ''; linkid = '' depth = 1 startpos = @scanner.pos @log.debug " Found a bracket-open at %d" % startpos # Scan the rest of the tag, allowing unlimited nested []s. If # the scanner runs out of text before the opening bracket is # closed, append the text and return (wasn't a valid anchor). while depth.nonzero? linktext = @scanner.scan_until( /\]|\[/ ) if linktext @log.debug " Found a bracket at depth %d: %p" % [ depth, linktext ] link += linktext # Decrement depth for each closing bracket depth += ( linktext[-1, 1] == ']' ? -1 : 1 ) @log.debug " Depth is now #{depth}" # If there's no more brackets, it must not be an anchor, so # just abort. else @log.debug " Missing closing brace, assuming non-link." link += @scanner.rest @scanner.terminate return text + '[' + link end end link.slice!( -1 ) # Trim final ']' @log.debug " Found leading link %p" % link # Markdown Extra: Footnote if link =~ /^\^(.+)/ then id = $1 if rs.footnotes[id] then rs.found_footnote_ids << id label = "[#{rs.found_footnote_ids.size}]" else rs.warnings << "undefined footnote id - #{id}" label = '[?]' end text += %Q|<sup id="footnote-ref:#{id}"><a href="#footnote:#{id}" rel="footnote">#{label}</a></sup>| # Look for a reference-style second part elsif @scanner.scan( RefLinkIdRegexp ) linkid = @scanner[1] linkid = link.dup if linkid.empty? linkid.downcase! @log.debug " Found a linkid: %p" % linkid # If there's a matching link in the link table, build an # anchor tag for it. if rs.urls.key?( linkid ) @log.debug " Found link key in the link table: %p" % rs.urls[linkid] url = escape_md( rs.urls[linkid] ) text += %{<a href="#{url}"} if rs.titles.key?(linkid) text += %{ title="%s"} % escape_md( rs.titles[linkid] ) end text += %{>#{link}</a>} # If the link referred to doesn't exist, just append the raw # source to the result else @log.debug " Linkid %p not found in link table" % linkid @log.debug " Appending original string instead: " @log.debug "%p" % @scanner.string[ startpos-1 .. @scanner.pos-1 ] rs.warnings << "link-id not found - #{linkid}" text += @scanner.string[ startpos-1 .. @scanner.pos-1 ] end # ...or for an inline style second part elsif @scanner.scan( InlineLinkRegexp ) url = @scanner[1] title = @scanner[3] @log.debug " Found an inline link to %p" % url url = "##{link}" if url == '#' # target anchor briefing (since AoBane 0.40) text += %{<a href="%s"} % escape_md( url ) if title title.gsub!( /"/, """ ) text += %{ title="%s"} % escape_md( title ) end text += %{>#{link}</a>} # No linkid part: just append the first part as-is. else @log.debug "No linkid, so no anchor. Appending literal text." text += @scanner.string[ startpos-1 .. @scanner.pos-1 ] end # if linkid # Plain text else @log.debug " Scanning to the next link from %p" % @scanner.rest text += @scanner.scan( /[^\[]+/ ) end end # until @scanner.empty? return text end
Transform URLs in a copy of the specified str
into links and return it.
# File lib/AoBane.rb, line 1609 def transform_auto_links( str, rs ) @log.debug " Transforming auto-links" str.gsub(AutoAnchorURLRegexp){ %|<a href="#{Util.escape_html($1)}">#{Util.escape_html($1)}</a>| }.gsub( AutoAnchorEmailRegexp ) {|addr| encode_email_address( unescape_special_chars($1) ) } end
Transform Markdown-style blockquotes in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1572 def transform_block_quotes( str, rs ) @log.debug " Transforming block quotes" str.gsub( BlockQuoteRegexp ) {|quote| @log.debug "Making blockquote from %p" % quote quote.gsub!( /^ *> ?/, '' ) # Trim one level of quoting quote.gsub!( /^ +$/, '' ) # Trim whitespace-only lines indent = " " * TabWidth quoted = %{<blockquote>\n%s\n</blockquote>\n\n} % apply_block_transforms( quote, rs ). gsub( /^/, indent ). gsub( PreChunk ) {|m| m.gsub(/^#{indent}/o, '') } @log.debug "Blockquoted chunk is: %p" % quoted quoted } end
Transform Markdown-style codeblocks in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1520 def transform_code_blocks( str, rs ) @log.debug " Transforming code blocks" str.gsub( CodeBlockRegexp ) {|block| codeblock = $1 remainder = $2 tmpl = %{\n\n<pre><code>%s\n</code></pre>\n\n%s} # patch for ruby 1.9.1 bug if tmpl.respond_to?(:force_encoding) then tmpl.force_encoding(str.encoding) end args = [ encode_code( outdent(codeblock), rs ).rstrip, remainder ] # recover all backslash escaped to original form EscapeTable.each {|char, hash| args[0].gsub!( hash[:md5re]){char} } # Generate the codeblock tmpl % args } end
Transform backticked spans into <code> spans.
# File lib/AoBane.rb, line 1983 def transform_code_spans( str, rs ) @log.debug " Transforming code spans" # Set up the string scanner and just return the string unless there's at # least one backtick. @scanner.string = str.dup unless @scanner.exist?( /`/ ) @scanner.terminate @log.warn "No backticks found for code span in %p" % str return str end @log.debug "Transforming code spans in %p" % str # Build the transformed text anew text = '' # Scan to the end of the string until @scanner.empty? # Scan up to an opening backtick if pre = @scanner.scan_until( /.??(?=`)/m ) text += pre @log.debug "Found backtick at %d after '...%s'" % [ @scanner.pos, text[-20, 20] ] # Make a pattern to find the end of the span opener = @scanner.scan( /`+/ ) len = opener.length closer = Regexp::new( opener ) @log.debug "Scanning for end of code span with %p" % closer # Scan until the end of the closing backtick sequence. Chop the # backticks off the resultant string, strip leading and trailing # whitespace, and encode any enitites contained in it. codespan = @scanner.scan_until( closer ) or raise FormatError::new( @scanner.rest[0,20], "No %p found before end" % opener ) @log.debug "Found close of code span at %d: %p" % [ @scanner.pos - len, codespan ] #p codespan.strip codespan.slice!( -len, len ) text += "<code>%s</code>" % encode_code( codespan.strip, rs ) # If there's no more backticks, just append the rest of the string # and move the scan pointer to the end else text += @scanner.rest @scanner.terminate end end return text end
# File lib/AoBane.rb, line 1414 def transform_definition_list_items(str, rs) buf = Util.generate_blank_string_io(str) buf.puts %Q|<dl>| lines = str.split("\n") until lines.empty? do dts = [] # get dt items while lines.first =~ /^(?!\:).+$/ do dts << lines.shift end dd_as_block = false # skip blank lines while not lines.empty? and lines.first.empty? do lines.shift dd_as_block = true end dds = [] while lines.first =~ DDLineRegexp do dd_buf = [] # dd first line unless (line = lines.shift).empty? then dd_buf << $1 << "\n" end # dd second and more lines (sequential with 1st-line) until lines.empty? or # stop if read all lines.first =~ /^[ ]{0,#{TabWidth - 1}}$/ or # stop if blank line lines.first =~ DDLineRegexp do # stop if new dd found dd_buf << outdent(lines.shift) << "\n" end # dd second and more lines (separated with 1st-line) until lines.empty? do # stop if all was read if lines.first.empty? then # blank line (skip) lines.shift dd_buf << "\n" elsif lines.first =~ /^[ ]{#{TabWidth},}/ then # indented body dd_buf << outdent(lines.shift) << "\n" else # not indented body break end end dds << dd_buf.join # skip blank lines unless lines.empty? then while lines.first.empty? do lines.shift end end end # html output dts.each do |dt| buf.puts %Q| <dt>#{apply_span_transforms(dt, rs)}</dt>| end dds.each do |dd| if dd_as_block then buf.puts %Q| <dd>#{apply_block_transforms(dd, rs)}</dd>| else dd.gsub!(/\n+\z/, '') # chomp linefeeds buf.puts %Q| <dd>#{apply_span_transforms(dd.chomp, rs)}</dd>| end end end buf.puts %Q|</dl>| return(buf.string) end # old # Pattern for matching codeblocks CodeBlockRegexp = %r{ (?:\n\n|\A|\A\n) ( # $1 = the code block (?: (?:[ ]{#{TabWidth}} | \t) # a tab or tab-width of spaces .*\n+ )+ ) (^[ ]{0,#{TabWidth - 1}}\S|\Z) # Lookahead for non-space at # line-start, or end of doc }x ### Transform Markdown-style codeblocks in a copy of the specified +str+ and ### return it. def transform_code_blocks( str, rs ) @log.debug " Transforming code blocks" str.gsub( CodeBlockRegexp ) {|block| codeblock = $1 remainder = $2 tmpl = %{\n\n<pre><code>%s\n</code></pre>\n\n%s} # patch for ruby 1.9.1 bug if tmpl.respond_to?(:force_encoding) then tmpl.force_encoding(str.encoding) end args = [ encode_code( outdent(codeblock), rs ).rstrip, remainder ] # recover all backslash escaped to original form EscapeTable.each {|char, hash| args[0].gsub!( hash[:md5re]){char} } # Generate the codeblock tmpl % args } end FencedCodeBlockRegexp = /^(\~{3,})\n((?m:.+?)\n)\1\n/ def pretransform_fenced_code_blocks( str, rs ) @log.debug " Transforming fenced code blocks => standard code blocks" str.gsub( FencedCodeBlockRegexp ) {|block| "\n~\n\n" + indent($2) + "\n~\n\n" } end # Pattern for matching Markdown blockquote blocks BlockQuoteRegexp = %r{ (?: ^[ ]*>[ ]? # '>' at the start of a line .+\n # rest of the first line (?:.+\n)* # subsequent consecutive lines \n* # blanks )+ }x PreChunk = %r{ ( ^ \s* <pre> .+? </pre> ) }xm ### Transform Markdown-style blockquotes in a copy of the specified +str+ ### and return it. def transform_block_quotes( str, rs ) @log.debug " Transforming block quotes" str.gsub( BlockQuoteRegexp ) {|quote| @log.debug "Making blockquote from %p" % quote quote.gsub!( /^ *> ?/, '' ) # Trim one level of quoting quote.gsub!( /^ +$/, '' ) # Trim whitespace-only lines indent = " " * TabWidth quoted = %{<blockquote>\n%s\n</blockquote>\n\n} % apply_block_transforms( quote, rs ). gsub( /^/, indent ). gsub( PreChunk ) {|m| m.gsub(/^#{indent}/o, '') } @log.debug "Blockquoted chunk is: %p" % quoted quoted } end # AoBane change: # allow loosely urls and addresses (BlueCloth is very strict) # # loose examples: # <skype:tetra-dice> (other protocol) # <ema+il@example.com> (ex: gmail alias) # # not adapted addresses: # <"Abc@def"@example.com> (refer to quoted-string of RFC 5321) AutoAnchorURLRegexp = /<(#{URI.regexp})>/ # $1 = url AutoAnchorEmailRegexp = /<([^'">\s]+?\@[^'">\s]+[.][a-zA-Z]+)>/ # $2 = address ### Transform URLs in a copy of the specified +str+ into links and return ### it. def transform_auto_links( str, rs ) @log.debug " Transforming auto-links" str.gsub(AutoAnchorURLRegexp){ %|<a href="#{Util.escape_html($1)}">#{Util.escape_html($1)}</a>| }.gsub( AutoAnchorEmailRegexp ) {|addr| encode_email_address( unescape_special_chars($1) ) } end # Encoder functions to turn characters of an email address into encoded # entities. Encoders = [ lambda {|char| "&#%03d;" % char}, lambda {|char| "&#x%X;" % char}, lambda {|char| char.chr }, ] ### Transform a copy of the given email +addr+ into an escaped version safer ### for posting publicly. def encode_email_address( addr ) rval = '' ("mailto:" + addr).each_byte {|b| case b when ?: rval += ":" when ?@ rval += Encoders[ rand(2) ][ b ] else r = rand(100) rval += ( r > 90 ? Encoders[2][ b ] : r < 45 ? Encoders[1][ b ] : Encoders[0][ b ] ) end } return %{<a href="%s">%s</a>} % [ rval, rval.sub(/.+?:/, '') ] end # Regexp for matching Setext-style headers SetextHeaderRegexp = %r{ (.+?) # The title text ($1) (?: # Markdown Extra: Header Id Attribute (optional) [ ]* # space after closing #'s \{\# (\S+?) # $2 = Id \} [ \t]* # allowed lazy spaces )? \n ([\-=])+ # Match a line of = or -. Save only one in $3. [ ]*\n+ }x # Regexp for matching ATX-style headers AtxHeaderRegexp = %r{ ^(\#+) # $1 = string of #'s [ ]* (.+?) # $2 = Header text [ ]* \#* # optional closing #'s (not counted) (?: # Markdown Extra: Header Id Attribute (optional) [ ]* # space after closing #'s \{\# (\S+?) # $3 = Id \} [ \t]* # allowed lazy spaces )? \n+ }x HeaderRegexp = Regexp.union(SetextHeaderRegexp, AtxHeaderRegexp) IdRegexp = /^[a-zA-Z][a-zA-Z0-9\:\._-]*$/ ### Apply Markdown header transforms to a copy of the given +str+ amd render ### state +rs+ and return the result. def transform_headers( str, rs ) @log.debug " Transforming headers" # Setext-style headers: # Header 1 # ======== # # Header 2 # -------- # section_numbers = [nil, nil, nil, nil, nil] str. gsub( HeaderRegexp ) {|m| if $1 then @log.debug "Found setext-style header" title, id, hdrchar = $1, $2, $3 case hdrchar when '=' level = 1 when '-' level = 2 end else @log.debug "Found ATX-style header" hdrchars, title, id = $4, $5, $6 level = hdrchars.length if level >= 7 then rs.warnings << "illegal header level - h#{level} ('#' symbols are too many)" end end prefix = '' if rs.numbering? then if level >= rs.numbering_start_level and level <= 6 then depth = level - rs.numbering_start_level section_numbers.each_index do |i| if i == depth and section_numbers[depth] then # increment a deepest number if current header's level equals last header's section_numbers[i] += 1 elsif i <= depth then # set default number if nil section_numbers[i] ||= 1 else # clear discardeds section_numbers[i] = nil end end no = '' (0..depth).each do |i| no << "#{section_numbers[i]}." end prefix = "#{no} " end end title_html = apply_span_transforms( title, rs ) unless id then case rs.header_id_type when HeaderIDType::ESCAPE id = escape_to_header_id(title_html) if rs.headers.find{|h| h.id == id} then rs.warnings << "header id collision - #{id}" id = "bfheader-#{Digest::MD5.hexdigest(title)}" end else id = "bfheader-#{Digest::MD5.hexdigest(title)}" end end title = "#{prefix}#{title}" title_html = "#{prefix}#{title_html}" unless id =~ IdRegexp then rs.warnings << "illegal header id - #{id} (legal chars: [a-zA-Z0-9_-.] | 1st: [a-zA-Z])" end if rs.block_transform_depth == 1 then rs.headers << RenderState::Header.new(id, level, title, title_html) end if @use_header_id then %{<h%d id="%s">%s</h%d>\n\n} % [ level, id, title_html, level ] else %{<h%d>%s</h%d>\n\n} % [ level, title_html, level ] end } end ### Wrap all remaining paragraph-looking text in a copy of +str+ inside <p> ### tags and return it. def form_paragraphs( str, rs ) @log.debug " Forming paragraphs" grafs = str. sub( /\A\n+/, '' ). sub( /\n+\z/, '' ). split( /\n{2,}/ ) rval = grafs.collect {|graf| # Unhashify HTML blocks if this is a placeholder if rs.html_blocks.key?( graf ) rs.html_blocks[ graf ] # no output if this is block separater elsif graf == '~' then '' # Otherwise, wrap in <p> tags else apply_span_transforms(graf, rs). sub( /^[ ]*/, '<p>' ) + '</p>' end }.join( "\n\n" ) @log.debug " Formed paragraphs: %p" % rval return rval end # Pattern to match the linkid part of an anchor tag for reference-style # links. RefLinkIdRegexp = %r{ [ ]? # Optional leading space (?:\n[ ]*)? # Optional newline + spaces \[ (.*?) # Id = $1 \] }x InlineLinkRegexp = %r{ \( # Literal paren [ ]* # Zero or more spaces <?(.+?)>? # URI = $1 [ ]* # Zero or more spaces (?: # ([\"\']) # Opening quote char = $2 (.*?) # Title = $3 \2 # Matching quote char )? # Title is optional \) }x ### Apply Markdown anchor transforms to a copy of the specified +str+ with ### the given render state +rs+ and return it. def transform_anchors( str, rs ) @log.debug " Transforming anchors" @scanner.string = str.dup text = '' # Scan the whole string until @scanner.empty? if @scanner.scan( /\[/ ) link = ''; linkid = '' depth = 1 startpos = @scanner.pos @log.debug " Found a bracket-open at %d" % startpos # Scan the rest of the tag, allowing unlimited nested []s. If # the scanner runs out of text before the opening bracket is # closed, append the text and return (wasn't a valid anchor). while depth.nonzero? linktext = @scanner.scan_until( /\]|\[/ ) if linktext @log.debug " Found a bracket at depth %d: %p" % [ depth, linktext ] link += linktext # Decrement depth for each closing bracket depth += ( linktext[-1, 1] == ']' ? -1 : 1 ) @log.debug " Depth is now #{depth}" # If there's no more brackets, it must not be an anchor, so # just abort. else @log.debug " Missing closing brace, assuming non-link." link += @scanner.rest @scanner.terminate return text + '[' + link end end link.slice!( -1 ) # Trim final ']' @log.debug " Found leading link %p" % link # Markdown Extra: Footnote if link =~ /^\^(.+)/ then id = $1 if rs.footnotes[id] then rs.found_footnote_ids << id label = "[#{rs.found_footnote_ids.size}]" else rs.warnings << "undefined footnote id - #{id}" label = '[?]' end text += %Q|<sup id="footnote-ref:#{id}"><a href="#footnote:#{id}" rel="footnote">#{label}</a></sup>| # Look for a reference-style second part elsif @scanner.scan( RefLinkIdRegexp ) linkid = @scanner[1] linkid = link.dup if linkid.empty? linkid.downcase! @log.debug " Found a linkid: %p" % linkid # If there's a matching link in the link table, build an # anchor tag for it. if rs.urls.key?( linkid ) @log.debug " Found link key in the link table: %p" % rs.urls[linkid] url = escape_md( rs.urls[linkid] ) text += %{<a href="#{url}"} if rs.titles.key?(linkid) text += %{ title="%s"} % escape_md( rs.titles[linkid] ) end text += %{>#{link}</a>} # If the link referred to doesn't exist, just append the raw # source to the result else @log.debug " Linkid %p not found in link table" % linkid @log.debug " Appending original string instead: " @log.debug "%p" % @scanner.string[ startpos-1 .. @scanner.pos-1 ] rs.warnings << "link-id not found - #{linkid}" text += @scanner.string[ startpos-1 .. @scanner.pos-1 ] end # ...or for an inline style second part elsif @scanner.scan( InlineLinkRegexp ) url = @scanner[1] title = @scanner[3] @log.debug " Found an inline link to %p" % url url = "##{link}" if url == '#' # target anchor briefing (since AoBane 0.40) text += %{<a href="%s"} % escape_md( url ) if title title.gsub!( /"/, """ ) text += %{ title="%s"} % escape_md( title ) end text += %{>#{link}</a>} # No linkid part: just append the first part as-is. else @log.debug "No linkid, so no anchor. Appending literal text." text += @scanner.string[ startpos-1 .. @scanner.pos-1 ] end # if linkid # Plain text else @log.debug " Scanning to the next link from %p" % @scanner.rest text += @scanner.scan( /[^\[]+/ ) end end # until @scanner.empty? return text end # Pattern to match strong emphasis in Markdown text BoldRegexp = %r{ (\*\*|__) (\S|\S.*?\S) \1 }x # Pattern to match normal emphasis in Markdown text ItalicRegexp = %r{ (\*|_) (\S|\S.*?\S) \1 }x ### Transform italic- and bold-encoded text in a copy of the specified +str+ ### and return it. def transform_italic_and_bold( str, rs ) @log.debug " Transforming italic and bold" str. gsub( BoldRegexp, %{<strong>\\2</strong>} ). gsub( ItalicRegexp, %{<em>\\2</em>} ) end ### Transform backticked spans into <code> spans. def transform_code_spans( str, rs ) @log.debug " Transforming code spans" # Set up the string scanner and just return the string unless there's at # least one backtick. @scanner.string = str.dup unless @scanner.exist?( /`/ ) @scanner.terminate @log.warn "No backticks found for code span in %p" % str return str end @log.debug "Transforming code spans in %p" % str # Build the transformed text anew text = '' # Scan to the end of the string until @scanner.empty? # Scan up to an opening backtick if pre = @scanner.scan_until( /.??(?=`)/m ) text += pre @log.debug "Found backtick at %d after '...%s'" % [ @scanner.pos, text[-20, 20] ] # Make a pattern to find the end of the span opener = @scanner.scan( /`+/ ) len = opener.length closer = Regexp::new( opener ) @log.debug "Scanning for end of code span with %p" % closer # Scan until the end of the closing backtick sequence. Chop the # backticks off the resultant string, strip leading and trailing # whitespace, and encode any enitites contained in it. codespan = @scanner.scan_until( closer ) or raise FormatError::new( @scanner.rest[0,20], "No %p found before end" % opener ) @log.debug "Found close of code span at %d: %p" % [ @scanner.pos - len, codespan ] #p codespan.strip codespan.slice!( -len, len ) text += "<code>%s</code>" % encode_code( codespan.strip, rs ) # If there's no more backticks, just append the rest of the string # and move the scan pointer to the end else text += @scanner.rest @scanner.terminate end end return text end # Next, handle inline images:  # Don't forget: encode * and _ InlineImageRegexp = %r{ ( # Whole match = $1 !\[ (.*?) \] # alt text = $2 \([ ]* <?(\S+?)>? # source url = $3 [ ]* (?: # (["']) # quote char = $4 (.*?) # title = $5 \4 # matching quote [ ]* )? # title is optional \) ) }x #" # Reference-style images ReferenceImageRegexp = %r{ ( # Whole match = $1 !\[ (.*?) \] # Alt text = $2 [ ]? # Optional space (?:\n[ ]*)? # One optional newline + spaces \[ (.*?) \] # id = $3 ) }x ### Turn image markup into image tags. def transform_images( str, rs ) @log.debug " Transforming images %p" % str # Handle reference-style labeled images: ![alt text][id] str. gsub( ReferenceImageRegexp ) {|match| whole, alt, linkid = $1, $2, $3.downcase @log.debug "Matched %p" % match res = nil alt.gsub!( /"/, '"' ) # for shortcut links like ![this][]. linkid = alt.downcase if linkid.empty? if rs.urls.key?( linkid ) url = escape_md( rs.urls[linkid] ) @log.debug "Found url '%s' for linkid '%s' " % [ url, linkid ] # Build the tag result = %{<img src="%s" alt="%s"} % [ url, alt ] if rs.titles.key?( linkid ) result += %{ title="%s"} % escape_md( rs.titles[linkid] ) end result += EmptyElementSuffix else result = whole end @log.debug "Replacing %p with %p" % [ match, result ] result }. # Inline image style gsub( InlineImageRegexp ) {|match| @log.debug "Found inline image %p" % match whole, alt, title = $1, $2, $5 url = escape_md( $3 ) alt.gsub!( /"/, '"' ) # Build the tag result = %{<img src="%s" alt="%s"} % [ url, alt ] unless title.nil? title.gsub!( /"/, '"' ) result += %{ title="%s"} % escape_md( title ) end result += EmptyElementSuffix @log.debug "Replacing %p with %p" % [ match, result ] result } end # Regexp to match special characters in a code block CodeEscapeRegexp = %r{( \* | _ | \{ | \} | \[ | \] | \\ )}x ### Escape any characters special to HTML and encode any characters special ### to Markdown in a copy of the given +str+ and return it. def encode_code( str, rs ) #str.gsub( %r{&}, '&' ). #gsub( %r{<}, '<' ). #gsub( %r{>}, '>' ). #gsub( CodeEscapeRegexp ) {|match| EscapeTable[match][:md5]} return str end def escape_to_header_id(str) URI.escape(escape_md(str.gsub(/<\/?[^>]*>/, "").gsub(/\s/, "_")).gsub("/", ".2F")).gsub("%", ".") end ################################################################# ### U T I L I T Y F U N C T I O N S ################################################################# ### Escape any markdown characters in a copy of the given +str+ and return ### it. def escape_md( str ) str. gsub( /\*|_/ ){|symbol| EscapeTable[symbol][:md5]} end # Matching constructs for tokenizing X/HTML HTMLCommentRegexp = %r{ <! ( -- .*? -- \s* )+ > }mx XMLProcInstRegexp = %r{ <\? .*? \?> }mx MetaTag = Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp ) HTMLTagOpenRegexp = %r{ < [a-z/!$] [^<>]* }imx HTMLTagCloseRegexp = %r{ > }x HTMLTagPart = Regexp::union( HTMLTagOpenRegexp, HTMLTagCloseRegexp ) ### Break the HTML source in +str+ into a series of tokens and return ### them. The tokens are just 2-element Array tuples with a type and the ### actual content. If this function is called with a block, the type and ### text parts of each token will be yielded to it one at a time as they are ### extracted. def tokenize_html( str ) depth = 0 tokens = [] @scanner.string = str.dup type, token = nil, nil until @scanner.empty? @log.debug "Scanning from %p" % @scanner.rest # Match comments and PIs without nesting if (( token = @scanner.scan(MetaTag) )) type = :tag # Do nested matching for HTML tags elsif (( token = @scanner.scan(HTMLTagOpenRegexp) )) tagstart = @scanner.pos @log.debug " Found the start of a plain tag at %d" % tagstart # Start the token with the opening angle depth = 1 type = :tag # Scan the rest of the tag, allowing unlimited nested <>s. If # the scanner runs out of text before the tag is closed, raise # an error. while depth.nonzero? # Scan either an opener or a closer chunk = @scanner.scan( HTMLTagPart ) or break # AoBane Fix (refer to spec/code-block.rb) @log.debug " Found another part of the tag at depth %d: %p" % [ depth, chunk ] token += chunk # If the last character of the token so far is a closing # angle bracket, decrement the depth. Otherwise increment # it for a nested tag. depth += ( token[-1, 1] == '>' ? -1 : 1 ) @log.debug " Depth is now #{depth}" end # Match text segments else @log.debug " Looking for a chunk of text" type = :text # Scan forward, always matching at least one character to move # the pointer beyond any non-tag '<'. token = @scanner.scan_until( /[^<]+/m ) end @log.debug " type: %p, token: %p" % [ type, token ] # If a block is given, feed it one token at a time. Add the token to # the token list to be returned regardless. if block_given? yield( type, token ) end tokens << [ type, token ] end return tokens end ### Return a copy of +str+ with angle brackets and ampersands HTML-encoded. def encode_html( str ) #str.gsub( /&(?!#?[x]?(?:[0-9a-f]+|\w+);)/i, "&" ). #gsub( %r{<(?![a-z/?\$!])}i, "<" ) return str end ### Return one level of line-leading tabs or spaces from a copy of +str+ and ### return it. def outdent( str ) str.gsub( /^(\t|[ ]{1,#{TabWidth}})/, '') end def indent(str) str.gsub( /^/, ' ' * TabWidth) end end
# File lib/AoBane.rb, line 1403 def transform_definition_lists(str, rs) @log.debug " Transforming definition lists at %p" % (str[0,100] + '...') str.gsub( DefinitionListRegexp ) {|list| @log.debug " Found definition list %p (captures=%p)" % [list, $~.captures] transform_definition_list_items(list, rs) } end
Apply Markdown header transforms to a copy of the given str
amd render state rs
and return the result.
# File lib/AoBane.rb, line 1693 def transform_headers( str, rs ) @log.debug " Transforming headers" # Setext-style headers: # Header 1 # ======== # # Header 2 # -------- # section_numbers = [nil, nil, nil, nil, nil] str. gsub( HeaderRegexp ) {|m| if $1 then @log.debug "Found setext-style header" title, id, hdrchar = $1, $2, $3 case hdrchar when '=' level = 1 when '-' level = 2 end else @log.debug "Found ATX-style header" hdrchars, title, id = $4, $5, $6 level = hdrchars.length if level >= 7 then rs.warnings << "illegal header level - h#{level} ('#' symbols are too many)" end end prefix = '' if rs.numbering? then if level >= rs.numbering_start_level and level <= 6 then depth = level - rs.numbering_start_level section_numbers.each_index do |i| if i == depth and section_numbers[depth] then # increment a deepest number if current header's level equals last header's section_numbers[i] += 1 elsif i <= depth then # set default number if nil section_numbers[i] ||= 1 else # clear discardeds section_numbers[i] = nil end end no = '' (0..depth).each do |i| no << "#{section_numbers[i]}." end prefix = "#{no} " end end title_html = apply_span_transforms( title, rs ) unless id then case rs.header_id_type when HeaderIDType::ESCAPE id = escape_to_header_id(title_html) if rs.headers.find{|h| h.id == id} then rs.warnings << "header id collision - #{id}" id = "bfheader-#{Digest::MD5.hexdigest(title)}" end else id = "bfheader-#{Digest::MD5.hexdigest(title)}" end end title = "#{prefix}#{title}" title_html = "#{prefix}#{title_html}" unless id =~ IdRegexp then rs.warnings << "illegal header id - #{id} (legal chars: [a-zA-Z0-9_-.] | 1st: [a-zA-Z])" end if rs.block_transform_depth == 1 then rs.headers << RenderState::Header.new(id, level, title, title_html) end if @use_header_id then %{<h%d id="%s">%s</h%d>\n\n} % [ level, id, title_html, level ] else %{<h%d>%s</h%d>\n\n} % [ level, title_html, level ] end } end
Transform any Markdown-style horizontal rules in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1301 def transform_hrules( str, rs ) @log.debug " Transforming horizontal rules" str.gsub( /^( ?[\-\*_] ?){3,}$/, "\n<hr#{EmptyElementSuffix}\n" ) end
Turn image markup into image tags.
# File lib/AoBane.rb, line 2069 def transform_images( str, rs ) @log.debug " Transforming images %p" % str # Handle reference-style labeled images: ![alt text][id] str. gsub( ReferenceImageRegexp ) {|match| whole, alt, linkid = $1, $2, $3.downcase @log.debug "Matched %p" % match res = nil alt.gsub!( /"/, '"' ) # for shortcut links like ![this][]. linkid = alt.downcase if linkid.empty? if rs.urls.key?( linkid ) url = escape_md( rs.urls[linkid] ) @log.debug "Found url '%s' for linkid '%s' " % [ url, linkid ] # Build the tag result = %{<img src="%s" alt="%s"} % [ url, alt ] if rs.titles.key?( linkid ) result += %{ title="%s"} % escape_md( rs.titles[linkid] ) end result += EmptyElementSuffix else result = whole end @log.debug "Replacing %p with %p" % [ match, result ] result }. # Inline image style gsub( InlineImageRegexp ) {|match| @log.debug "Found inline image %p" % match whole, alt, title = $1, $2, $5 url = escape_md( $3 ) alt.gsub!( /"/, '"' ) # Build the tag result = %{<img src="%s" alt="%s"} % [ url, alt ] unless title.nil? title.gsub!( /"/, '"' ) result += %{ title="%s"} % escape_md( title ) end result += EmptyElementSuffix @log.debug "Replacing %p with %p" % [ match, result ] result } end
Transform italic- and bold-encoded text in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1973 def transform_italic_and_bold( str, rs ) @log.debug " Transforming italic and bold" str. gsub( BoldRegexp, %{<strong>\\2</strong>} ). gsub( ItalicRegexp, %{<em>\\2</em>} ) end
Transform list items in a copy of the given str
and return it.
# File lib/AoBane.rb, line 1360 def transform_list_items( str, rs ) @log.debug " Transforming list items" # Trim trailing blank lines str = str.sub( /\n{2,}\z/, "\n" ) str.gsub( ListItemRegexp ) {|line| @log.debug " Found item line %p" % line leading_line, item = $1, $4 separating_lines = $5 if leading_line or /\n{2,}/.match(item) or not separating_lines.empty? then @log.debug " Found leading line or item has a blank" item = apply_block_transforms( outdent(item), rs ) else # Recursion for sub-lists @log.debug " Recursing for sublist" item = transform_lists( outdent(item), rs ).chomp item = apply_span_transforms( item, rs ) end %{<li>%s</li>\n} % item } end
Transform Markdown-style lists in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1333 def transform_lists( str, rs ) @log.debug " Transforming lists at %p" % (str[0,100] + '...') str.gsub( ListRegexp ) {|list| @log.debug " Found list %p" % list bullet = $1 list_type = (ListMarkerUl.match(bullet) ? "ul" : "ol") %{<%s>\n%s</%s>\n} % [ list_type, transform_list_items( list, rs ), list_type, ] } end
# File lib/AoBane.rb, line 1218 def transform_table_rows(str, rs) # split cells to 2-d array data = str.split("\n").map{|x| x.split('|')} caption = '' #Inserted by set.minami 2013-04-20 captionName = '' if /#{CaptionRegexp}/ =~ data[0].first then caption = if $1.nil? then '' else $1 end captionName = if $3.nil? then '' else $3 end data.shift end #Inserted by set.minami 2013-04-20 data.each do |row| if row.first.nil? then next end # cut left space row.first.lstrip! # cut when optional side-borders is included row.shift if row.first.empty? end column_attrs = [] re = '' re << if captionName == '' then "<table>\n" else "<table id=\"#{captionName}\">\n" end re << "<caption>#{caption}</caption>\n" #Insert by set.minami 2013-04-20 # head is exist? #if !data[1].nil? && data[1].last =~ /\s+/ then ### p data # data.each{|d| # d.pop # } #end #Insert by set.minami @ 2013-04-29 if data.size >= 3 and data[1].all?{|x| x =~ TableSeparatorCellRegexp} then head_row = data.shift separator_row = data.shift separator_row.each do |cell| cell.match TableSeparatorCellRegexp left = $1; right = $2 if left and right then column_attrs << ' style="text-align: center"' elsif right then column_attrs << ' style="text-align: right"' elsif left then column_attrs << ' style="text-align: left"' else column_attrs << '' end end re << "\t<thead><tr>\n" head_row.each_with_index do |cell, i| re << "\t\t<th#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</th>\n" end re << "\t</tr></thead>\n" end # data row re << "\t<tbody>\n" data.each do |row| re << "\t\t<tr>\n" row.each_with_index do |cell, i| re << "\t\t\t<td#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</td>\n" end re << "\t\t</tr>\n" end re << "\t</tbody>\n" re << "</table>\n" re end
Transform tables.
# File lib/AoBane.rb, line 1200 def transform_tables(str, rs) str.gsub(TableRegexp){ transform_table_rows($~[0], rs) } end
Transform any Markdown-style horizontal rules in a copy of the specified str
and return it.
# File lib/AoBane.rb, line 1139 def transform_toc( str, rs ) @log.debug " Transforming tables of contents" str.gsub(TOCRegexp){ start_level = 2 # default end_level = 6 param = $1 if param then if param =~ TOCStartLevelRegexp then if !($1) and !($2) then rs.warnings << "illegal TOC parameter - #{param} (valid example: 'h2..h4')" else start_level = ($1 ? $1.to_i : 2) end_level = ($2 ? $2.to_i : 6) end else rs.warnings << "illegal TOC parameter - #{param} (valid example: 'h2..h4')" end end if rs.headers.first and rs.headers.first.level >= (start_level + 1) then rs.warnings << "illegal structure of headers - h#{start_level} should be set before h#{rs.headers.first.level}" end ul_text = "\n\n" rs.headers.each do |header| if header.level >= start_level and header.level <= end_level then ul_text << ' ' * TabWidth * (header.level - start_level) ul_text << '* ' ul_text << %Q|<a href="##{header.id}" rel="toc">#{header.content_html}</a>| ul_text << "\n" end end ul_text << "\n" ul_text # output } end
Swap escaped special characters in a copy of the given str
and return it.
# File lib/AoBane.rb, line 1068 def unescape_special_chars( str ) EscapeTable.each {|char, hash| @log.debug "Unescaping escaped %p with %p" % [ char, hash[:md5re] ] str.gsub!( hash[:md5re], hash[:unescape] ) } return str end