class AoBane::Parser

Constants

AtxHeaderRegexp

Regexp for matching ATX-style headers

AutoAnchorEmailRegexp
AutoAnchorURLRegexp

AoBane change:

allow loosely urls and addresses (BlueCloth is very strict)

loose examples:

<skype:tetra-dice>     (other protocol)
<ema+il@example.com>     (ex: gmail alias)

not adapted addresses:

<"Abc@def"@example.com>  (refer to quoted-string of RFC 5321)
BlockQuoteRegexp

Pattern for matching Markdown blockquote blocks

BoldRegexp

Pattern to match strong emphasis in Markdown text

CaptionRegexp
CodeBlockRegexp

Pattern for matching codeblocks

CodeEscapeRegexp

Regexp to match special characters in a code block

DDLineRegexp
DefinitionListRegexp
EmptyElementSuffix

The tag-closing string – set to '>' for HTML

Encoders

Encoder functions to turn characters of an email address into encoded entities.

EscapeTable

Table of MD5 sums for escaped characters

FencedCodeBlockRegexp
FootnoteDefinitionRegexp

Footnotes defs are in the form: [^id]: footnote contents.

FootnoteIdRegexp
HTMLCommentRegexp

Matching constructs for tokenizing X/HTML

HTMLTagCloseRegexp
HTMLTagOpenRegexp
HTMLTagPart
HeaderRegexp
HruleBlockRegexp

Special case for <hr />.

IdRegexp
InlineImageRegexp

Next, handle inline images: ![alt text](url “optional title”) Don't forget: encode * and _

InlineLinkRegexp
ItalicRegexp

Pattern to match normal emphasis in Markdown text

LinkRegexp

Link defs are in the form: ^[id]: url “optional title”

ListItemRegexp

Pattern for transforming list items

ListMarkerAny
ListMarkerOl

Patterns to match and transform lists

ListMarkerUl
ListRegexp
LooseBlockRegexp

More-liberal block-matching

LooseBlockTags
LooseTagPattern
MetaTag
PreChunk
RefLinkIdRegexp

Pattern to match the linkid part of an anchor tag for reference-style links.

ReferenceImageRegexp

Reference-style images

SetextHeaderRegexp

Regexp for matching Setext-style headers

StrictBlockRegexp

Nested blocks:

<div>
        <div>
        tags for inner block must be indented.
        </div>
</div>
StrictBlockTags

The list of tags which are considered block-level constructs and an alternation pattern suitable for use in regexps made from the list

StrictTagPattern
TOCRegexp
TOCStartLevelRegexp
TabWidth

Tab width for detab! if none is specified

TableRegexp
TableSeparatorCellRegexp
XMLProcInstRegexp

Attributes

display_warnings[RW]

AoBane Extension: display warnings on the top of output html (default: true)

filter_html[RW]

Filters for controlling what gets output for untrusted input. (But really, you're filtering bad stuff out of untrusted input at submission-time via untainting, aren't you?)

filter_styles[RW]

Filters for controlling what gets output for untrusted input. (But really, you're filtering bad stuff out of untrusted input at submission-time via untainting, aren't you?)

fold_lines[RW]

RedCloth-compatibility accessor. Line-folding is part of Markdown syntax, so this isn't used by anything.

use_header_id[RW]

AoBane Extension: add id to each header, for toc and anchors. (default: true)

Public Class Methods

new(*restrictions) click to toggle source

Create a new AoBane parser.

# File lib/AoBane.rb, line 460
def initialize(*restrictions)
        @log = Logger::new( $deferr )
        @log.level = $DEBUG ?
                Logger::DEBUG :
                ($VERBOSE ? Logger::INFO : Logger::WARN)
        @scanner = nil

        # Add any restrictions, and set the line-folding attribute to reflect
        # what happens by default.
        @filter_html = nil
        @filter_styles = nil
        restrictions.flatten.each {|r| __send__("#{r}=", true) }
        @fold_lines = true

        @use_header_id = true
        @display_warnings = true

        @log.debug "String is: %p" % self
end

Public Instance Methods

apply_block_transforms( str, rs ) click to toggle source

Do block-level transforms on a copy of str using the specified render state rs and return the results.

# File lib/AoBane.rb, line 820
def apply_block_transforms( str, rs )
        rs.block_transform_depth += 1

        # Port: This was called '_runBlockGamut' in the original

        @log.debug "Applying block transforms to:\n  %p" % str
        text = str
        
        text = pretransform_fenced_code_blocks( text, rs )
        text = pretransform_block_separators(text, rs)

        text = transform_headers( text, rs )
        text = transform_toc(text, rs)

        text = transform_hrules( text, rs )
        text = transform_lists( text, rs )
        text = transform_definition_lists( text, rs ) # AoBane Extension
        text = transform_code_blocks( text, rs )
        text = transform_block_quotes( text, rs )
        text = transform_tables(text, rs)
        text = hide_html_blocks( text, rs )

        text = form_paragraphs( text, rs )

        rs.block_transform_depth -= 1
        @log.debug "Done with block transforms:\n  %p" % text
        return text
end
apply_span_transforms( str, rs ) click to toggle source

Apply Markdown span transforms to a copy of the specified str with the given render state rs and return it.

# File lib/AoBane.rb, line 852
def apply_span_transforms( str, rs )
        @log.debug "Applying span transforms to:\n  %p" % str

        str = transform_code_spans( str, rs )
        str = transform_auto_links( str, rs )
        str = encode_html( str )
        str = transform_images( str, rs )
        str = transform_anchors( str, rs )
        str = transform_italic_and_bold( str, rs )

        # Hard breaks
        str.gsub!( / {2,}\n/, "<br#{EmptyElementSuffix}\n" )

        @log.debug "Done with span transforms:\n  %p" % str
        return str
end
detab( str, tabwidth=TabWidth ) click to toggle source

Convert tabs in str to spaces. (this method is reformed to function-like method from original BlueCloth)

# File lib/AoBane.rb, line 805
def detab( str, tabwidth=TabWidth )
        re = str.split( /\n/ ).collect {|line|
                line.gsub( /(.*?)\t/ ) do
                        $1 + ' ' * (tabwidth - $1.length % tabwidth)
                end
        }.join("\n")

        re
end
doc2html(doc)
Alias for: document_to_html
document_to_html(doc) click to toggle source
# File lib/AoBane.rb, line 709
def document_to_html(doc)
        rs = RenderState.new
        if doc.numbering? then
                rs.numbering = true
        end
        rs.numbering_start_level = doc.numbering_start_level
        rs.header_id_type = doc.header_id_type

        body_html = nil

        if doc.encoding_type then
                Util.change_kcode(doc.kcode){
                        body_html = parse_text(doc.body, rs)
                }
        else
                body_html = parse_text(doc.body, rs)
        end

        out = Util.generate_blank_string_io(doc.body)

        # XHTML decleration
        out.puts %Q|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">|

        # html start
        out.puts %Q|<html>|

        # head
        out.puts %Q|<head>|

        if doc.encoding_type and (charset = EncodingType.convert_to_charset(doc.encoding_type)) then
                out.puts %Q|<meta http-equiv="Content-Type" content="text/html; charset=#{charset}" />|
        end

        h1 = rs.headers.find{|x| x.level == 1}
        h1_content = (h1 ? h1.content : nil)
        title = Util.escape_html(doc.title || h1_content || 'no title (Generated by AoBane)')
        out.puts %Q|<title>#{title}</title>|

        %w(description keywords).each do |name|
                if doc[name] then
                        content = Util.escape_html(doc[name])
                        out.puts %Q|<meta name="#{name}" content="#{content}" />|
                end
        end


        if doc['css'] then
                href = Util.escape_html(doc.css)
                out.puts %Q|<link rel="stylesheet" type="text/css" href="#{href}" />|

        end

        if doc['rdf-feed'] then
                href = Util.escape_html(doc['rdf-feed'])
                out.puts %Q|<link rel="alternate" type="application/rdf+xml" href="#{href}" />|
        end



        if doc['rss-feed'] then
                href = Util.escape_html(doc['rss-feed'])
                out.puts %Q|<link rel="alternate" type="application/rss+xml" href="#{href}" />|
        end

        if doc['atom-feed'] then
                href = Util.escape_html(doc['atom-feed'])
                out.puts %Q|<link rel="alternate" type="application/atom+xml" href="#{href}" />|
        end

        out.puts %Q|</head>|

        # body
        out.puts %Q|<body>|
        out.puts
        out.puts body_html
        out.puts
        out.puts %Q|</body>|

        # html end
        out.puts %Q|</html>|


        return out.string
end
Also aliased as: doc2html
encode_backslash_escapes( str ) click to toggle source

Return a copy of the given str with any backslashed special character in it replaced with MD5 placeholders.

# File lib/AoBane.rb, line 1080
def encode_backslash_escapes( str )
        # Make a copy with any double-escaped backslashes encoded
        text = str.gsub( /\\\\/, EscapeTable['\\\\'][:md5] )

        EscapeTable.each_pair {|char, esc|
                next if char == '\\\\'
                next unless char =~ /\\./
                text.gsub!( esc[:re], esc[:md5] )
        }

        return text
end
encode_code( str, rs ) click to toggle source

Escape any characters special to HTML and encode any characters special to Markdown in a copy of the given str and return it.

# File lib/AoBane.rb, line 2128
def encode_code( str, rs )
        #str.gsub( %r{&}, '&amp;' ).
                #gsub( %r{<}, '&lt;' ).
                #gsub( %r{>}, '&gt;' ).
                #gsub( CodeEscapeRegexp ) {|match| EscapeTable[match][:md5]}
  return str
end
encode_email_address( addr ) click to toggle source

Transform a copy of the given email addr into an escaped version safer for posting publicly.

# File lib/AoBane.rb, line 1629
def encode_email_address( addr )

        rval = ''
        ("mailto:" + addr).each_byte {|b|
                case b
                when ?:
                        rval += ":"
                when ?@
                        rval += Encoders[ rand(2) ][ b ]
                else
                        r = rand(100)
                        rval += (
                                r > 90 ? Encoders[2][ b ] :
                                r < 45 ? Encoders[1][ b ] :
                                                 Encoders[0][ b ]
                        )
                end
        }

        return %{<a href="%s">%s</a>} % [ rval, rval.sub(/.+?:/, '') ]
end
encode_html( str ) click to toggle source

Return a copy of str with angle brackets and ampersands HTML-encoded.

# File lib/AoBane.rb, line 2233
def encode_html( str )
        #str.gsub( /&(?!#?[x]?(?:[0-9a-f]+|\w+);)/i, "&amp;" ).
                #gsub( %r{<(?![a-z/?\$!])}i, "&lt;" )
                return str
end
escape_md( str ) click to toggle source

Escape any markdown characters in a copy of the given str and return it.

# File lib/AoBane.rb, line 2146
def escape_md( str )
        str.
                gsub( /\*|_/ ){|symbol| EscapeTable[symbol][:md5]}
end
escape_special_chars( str ) click to toggle source

Escape special characters in the given str

# File lib/AoBane.rb, line 1036
def escape_special_chars( str )
        @log.debug "  Escaping special characters"
        text = ''

        # The original Markdown source has something called '$tags_to_skip'
        # declared here, but it's never used, so I don't define it.

        tokenize_html( str ) {|token, str|
                @log.debug "   Adding %p token %p" % [ token, str ]
                case token

                # Within tags, encode * and _
                when :tag
                        text += str.
                                gsub( /\*/, EscapeTable['*'][:md5] ).
                                gsub( /_/, EscapeTable['_'][:md5] )

                # Encode backslashed stuff in regular text
                when :text
                        text += encode_backslash_escapes( str )
                else
                        raise TypeError, "Unknown token type %p" % token
                end
        }

        @log.debug "  Text with escapes is now: %p" % text
        return text
end
escape_to_header_id(str) click to toggle source
# File lib/AoBane.rb, line 2136
def escape_to_header_id(str)
        URI.escape(escape_md(str.gsub(/<\/?[^>]*>/, "").gsub(/\s/, "_")).gsub("/", ".2F")).gsub("%", ".")
end
form_paragraphs( str, rs ) click to toggle source

Wrap all remaining paragraph-looking text in a copy of str inside <p> tags and return it.

# File lib/AoBane.rb, line 1793
def form_paragraphs( str, rs )
        @log.debug " Forming paragraphs"
        grafs = str.
                sub( /\A\n+/, '' ).
                sub( /\n+\z/, '' ).
                split( /\n{2,}/ )

        rval = grafs.collect {|graf|

                # Unhashify HTML blocks if this is a placeholder
                if rs.html_blocks.key?( graf )
                        rs.html_blocks[ graf ]

                # no output if this is block separater
                elsif graf == '~' then
                        ''

                # Otherwise, wrap in <p> tags
                else
                        apply_span_transforms(graf, rs).
                                sub( /^[ ]*/, '<p>' ) + '</p>'
                end
        }.join( "\n\n" )

        @log.debug " Formed paragraphs: %p" % rval
        return rval
end
hide_html_blocks( str, rs ) click to toggle source

Replace all blocks of HTML in str that start in the left margin with tokens.

# File lib/AoBane.rb, line 925
def hide_html_blocks( str, rs )
        @log.debug "Hiding HTML blocks in %p" % str

        # Tokenizer proc to pass to gsub
        tokenize = lambda {|match|
                key = Digest::MD5::hexdigest( match )
                rs.html_blocks[ key ] = match
                @log.debug "Replacing %p with %p" % [ match, key ]
                "\n\n#{key}\n\n"
        }

        rval = str.dup

        @log.debug "Finding blocks with the strict regex..."
        rval.gsub!( StrictBlockRegexp, &tokenize )

        @log.debug "Finding blocks with the loose regex..."
        rval.gsub!( LooseBlockRegexp, &tokenize )

        @log.debug "Finding hrules..."
        rval.gsub!( HruleBlockRegexp ) {|match| $1 + tokenize[$2] }

        return rval
end
indent(str) click to toggle source
# File lib/AoBane.rb, line 2246
def indent(str)
        str.gsub( /^/, ' ' * TabWidth)
end
outdent( str ) click to toggle source

Return one level of line-leading tabs or spaces from a copy of str and return it.

# File lib/AoBane.rb, line 2242
def outdent( str )
        str.gsub( /^(\t|[ ]{1,#{TabWidth}})/, '')
end
parse(source, rs = nil)
Alias for: parse_text
parse_document(source, default_enc = EncodingType::UTF8) click to toggle source
# File lib/AoBane.rb, line 693
def parse_document(source, default_enc = EncodingType::UTF8)
        doc = Document.parse(source, default_enc)

        return document_to_html(doc)
end
parse_document_file(path, default_enc = EncodingType::UTF8) click to toggle source
# File lib/AoBane.rb, line 699
def parse_document_file(path, default_enc = EncodingType::UTF8)
        doc = nil
        open(path){|f|
                doc = Document.parse_io(f, default_enc)
        }

        return document_to_html(doc)
end
parse_file(path)
Alias for: parse_text_file
parse_text(source, rs = nil) click to toggle source

Render Markdown-formatted text in this string object as HTML and return it. The parameter is for compatibility with RedCloth, and is currently unused, though that may change in the future.

# File lib/AoBane.rb, line 506
                def parse_text(source, rs = nil)
                  rs ||= RenderState.new
                  
                  # check
                  case rs.header_id_type
                        when HeaderIDType::MD5, HeaderIDType::ESCAPE
                        else
                          rs.warnings << "illegal header id type - #{rs.header_id_type}"
                        end
                  
                  # Create a StringScanner we can reuse for various lexing tasks
                  @scanner = StringScanner::new( '' )
                  
                  # Make a copy of the string with normalized line endings, tabs turned to
                  # spaces, and a couple of guaranteed newlines at the end
                  
                  text = detab(source.gsub( /\r\n?/, "\n" ))
                  text += "\n\n"
                  @log.debug "Normalized line-endings: %p" % text
                  
                  #text = Utilities::prePaling(text) #Insert by set.minami 2013-04-27
                  #Insert by set.minami 2013-04-03
                  text = transform_block_quotes(text, rs)
                  nrange = []
                  departure = 1
                  preproc = Marshal.load(Marshal.dump(text))
                  text.clear
                  stack = []
                  html_text_number = 0
#                  Utilities::initNumberStack

                  preproc.lines { |line|
                    html_text_number += 1
                    begin
                      line.gsub!(/^\{nrange:(.*?)(;\d+)??\}/){ |match|
                        #depNum = $2.delete(';').to_i
                        #departure = if depNum > 0 then depNum else 1 end
                        if /h(\d)\-h(\d)/i =~ $1
                          nrange.push($1)
                          nrange.push($2)
                          if nrange.size > 2 then
                            nrange.pop
                            nrange.pop
                            raise "Syntax Error!" 
                          end
                        end
                        next
                      }
                      @log.debug line                          
                      #calculate numbering
                      range = nrange[1].to_i - nrange[0].to_i
                      if range == 0 then range = 1 end
                      if range < 0 then 
                        p "AoBane Syntax Error:Header range is WRONG!" +
                          "@ l.#{html_text_number}";exit(-1)
                        raise FatalError,"AoBane Syntax Error:Header range is WRONG!"
                      end
                      if line =~ /^(%{1,#{range}})(.*?)\n/ then
                        text << Utilities.
                          calcSectionNo(nrange.min,range,$1.size,departure,$2,stack) +
                          "\n"
                      else
                        text << line
                      end
                      @log.debug nrange.minmax
                       rescue => e
                      @log.warn "AoBane Syntax WARNING l.#{html_text_number}:#{line.chomp} haven't adopted" 
                      @log.warn e                          
                    end
                  }

                  text.gsub!(/\*\[(.*?)\]\((.*?)(\|.*?)*(\/.*?)*\)/){|match|
                    '<font color="' +
                    if $2.nil? then '' else $2 end      +'" ' +
                    'face="' +
                    if $3.nil? then '' else $3.delete('|') end + '" ' +
                    'size="' +
                    if $4.nil? then '' else $4.delete('/') end + '">' +
                    $1 + '</font>'
                  }
                  #Insert by set.minami 2013-04-21
                  text = Utilities::abbrPreProcess(text)
                  #Insert by set.minami 2013-04-01
                  text.gsub!(/\\TeX\{(.+?)\\TeX\}/){
                    begin
                      $1.to_mathml
                    rescue => e
                      puts 'math_ml Error: ' + $1
                      puts e
                    end
                  }

                  text = Utilities::preProcFence(text,0).join("\n") #Insert by set.minami 2013-04-27
                  #Insert by set.minami 2013-03-30
                  #Insert by set.minami

                        # Filter HTML if we're asked to do so
                        if self.filter_html
                                #text.gsub!( "<", "&lt;" )
                                #text.gsub!( ">", "&gt;" )
                                @log.debug "Filtered HTML: %p" % text
                        end

                        # Simplify blank lines
                        text.gsub!( /^ +$/, '' )
                        @log.debug "Tabs -> spaces/blank lines stripped: %p" % text


                        # Replace HTML blocks with placeholders
                        text = hide_html_blocks( text, rs )
                        @log.debug "Hid HTML blocks: %p" % text
                        @log.debug "Render state: %p" % rs


                        # Strip footnote definitions, store in render state
                        text = strip_footnote_definitions( text, rs )
                        @log.debug "Stripped footnote definitions: %p" % text
                        @log.debug "Render state: %p" % rs


                        # Strip link definitions, store in render state
                        text = strip_link_definitions( text, rs )
                        @log.debug "Stripped link definitions: %p" % text
                        @log.debug "Render state: %p" % rs

                        # Escape meta-characters
                        text = escape_special_chars( text )
                        @log.debug "Escaped special characters: %p" % text

                        # Transform block-level constructs
                        text = apply_block_transforms( text, rs )
                        @log.debug "After block-level transforms: %p" % text

                        # Now swap back in all the escaped characters
                        text = unescape_special_chars( text )
                        @log.debug "After unescaping special characters: %p" % text

                        # Extend footnotes
                        unless rs.footnotes.empty? then
                                text << %Q|<div class="footnotes"><hr#{EmptyElementSuffix}\n<ol>\n|
                                rs.found_footnote_ids.each do |id|
                                        content = rs.footnotes[id]
                                        html = apply_block_transforms(content.sub(/\n+\Z/, '') + %Q| <a href="#footnote-ref:#{id}" rev="footnote">&#8617;</a>|, rs)
                                        text << %Q|<li id="footnote:#{id}">\n#{html}\n</li>|
                                end
                                text << %Q|</ol>\n</div>\n|
                        end

                        # Display warnings
                        if @display_warnings then
                                unless rs.warnings.empty? then
                                        html = %Q|<pre><strong>[WARNINGS]\n|
                                        html << rs.warnings.map{|x| Util.escape_html(x)}.join("\n")
                                        html << %Q|</strong></pre>|

                                        text = html + text
                                end
                        end

                  #Insert by set.minami 2013-04-21
                  text = Utilities::abbrPostProcess(text)
                  #Insert by set.minami 2013-03-30
                  text = Utilities::insertTimeStamp(text)
                  text = Utilities::postProcFence(text) #Insert by set.minami 2013-04-27

                  text = Utilities::transformSpecialChar(text) #Insert by set.minami 2013-04-27

                  return text
              end
Also aliased as: parse
parse_text_file(path) click to toggle source
# File lib/AoBane.rb, line 686
def parse_text_file(path)
        parse_text(File.read(path))
end
Also aliased as: parse_file
parse_text_with_render_state(str, rs = nil) click to toggle source

return values are extended. (mainly for testing)

# File lib/AoBane.rb, line 679
def parse_text_with_render_state(str, rs = nil)
        rs ||= RenderState.new
        html = parse_text(str, rs)

        return [html, rs]
end
pretransform_block_separators(str, rs) click to toggle source
# File lib/AoBane.rb, line 1094
def pretransform_block_separators(str, rs)
        str.gsub(/^[ ]{0,#{TabWidth - 1}}[~][ ]*\n/){
                "\n~\n\n"
        }
end
pretransform_fenced_code_blocks( str, rs ) click to toggle source
# File lib/AoBane.rb, line 1549
def pretransform_fenced_code_blocks( str, rs )
        @log.debug " Transforming fenced code blocks => standard code blocks"

        str.gsub( FencedCodeBlockRegexp ) {|block|
                "\n~\n\n" + indent($2) + "\n~\n\n"
        }
end
strip_footnote_definitions(str, rs) click to toggle source
# File lib/AoBane.rb, line 1009
def strip_footnote_definitions(str, rs)
        str.gsub( FootnoteDefinitionRegexp ) {|match|
                id = $1; content1 = $2; content2 = $3

                unless id =~ FootnoteIdRegexp then
                        rs.warnings << "illegal footnote id - #{id} (legal chars: a-zA-Z0-9_-.:)"
                end

                if content2 then
                        @log.debug "   Stripping multi-line definition %p, %p" % [$2, $3]
                        content = content1 + "\n" + outdent(content2.chomp)
                        @log.debug "   Stripped multi-line definition %p, %p" % [id, content]
                        rs.footnotes[id] = content
                else
                        content = content1 || ''
                        @log.debug "   Stripped single-line definition %p, %p" % [id, content]
                        rs.footnotes[id] = content
                end



                ""
        }
end
tokenize_html( str ) { |type, token| ... } click to toggle source

Break the HTML source in str into a series of tokens and return them. The tokens are just 2-element Array tuples with a type and the actual content. If this function is called with a block, the type and text parts of each token will be yielded to it one at a time as they are extracted.

# File lib/AoBane.rb, line 2166
def tokenize_html( str )
        depth = 0
        tokens = []
        @scanner.string = str.dup
        type, token = nil, nil

        until @scanner.empty?
                @log.debug "Scanning from %p" % @scanner.rest

                # Match comments and PIs without nesting
                if (( token = @scanner.scan(MetaTag) ))
                        type = :tag

                # Do nested matching for HTML tags
                elsif (( token = @scanner.scan(HTMLTagOpenRegexp) ))
                        tagstart = @scanner.pos
                        @log.debug " Found the start of a plain tag at %d" % tagstart

                        # Start the token with the opening angle
                        depth = 1
                        type = :tag

                        # Scan the rest of the tag, allowing unlimited nested <>s. If
                        # the scanner runs out of text before the tag is closed, raise
                        # an error.
                        while depth.nonzero?

                                # Scan either an opener or a closer
                                chunk = @scanner.scan( HTMLTagPart ) or
                                        break # AoBane Fix (refer to spec/code-block.rb)

                                @log.debug "  Found another part of the tag at depth %d: %p" % [ depth, chunk ]

                                token += chunk

                                # If the last character of the token so far is a closing
                                # angle bracket, decrement the depth. Otherwise increment
                                # it for a nested tag.
                                depth += ( token[-1, 1] == '>' ? -1 : 1 )
                                @log.debug "  Depth is now #{depth}"
                        end

                # Match text segments
                else
                        @log.debug " Looking for a chunk of text"
                        type = :text

                        # Scan forward, always matching at least one character to move
                        # the pointer beyond any non-tag '<'.
                        token = @scanner.scan_until( /[^<]+/m )
                end

                @log.debug " type: %p, token: %p" % [ type, token ]

                # If a block is given, feed it one token at a time. Add the token to
                # the token list to be returned regardless.
                if block_given?
                        yield( type, token )
                end
                tokens << [ type, token ]
        end

        return tokens
end
transform_anchors( str, rs ) click to toggle source

Apply Markdown anchor transforms to a copy of the specified str with the given render state rs and return it.

# File lib/AoBane.rb, line 1847
def transform_anchors( str, rs )
        @log.debug " Transforming anchors"
        @scanner.string = str.dup
        text = ''

        # Scan the whole string
        until @scanner.empty?

                if @scanner.scan( /\[/ )
                        link = ''; linkid = ''
                        depth = 1
                        startpos = @scanner.pos
                        @log.debug " Found a bracket-open at %d" % startpos

                        # Scan the rest of the tag, allowing unlimited nested []s. If
                        # the scanner runs out of text before the opening bracket is
                        # closed, append the text and return (wasn't a valid anchor).
                        while depth.nonzero?
                                linktext = @scanner.scan_until( /\]|\[/ )

                                if linktext
                                        @log.debug "  Found a bracket at depth %d: %p" % [ depth, linktext ]
                                        link += linktext

                                        # Decrement depth for each closing bracket
                                        depth += ( linktext[-1, 1] == ']' ? -1 : 1 )
                                        @log.debug "  Depth is now #{depth}"

                                # If there's no more brackets, it must not be an anchor, so
                                # just abort.
                                else
                                        @log.debug "  Missing closing brace, assuming non-link."
                                        link += @scanner.rest
                                        @scanner.terminate
                                        return text + '[' + link
                                end
                        end
                        link.slice!( -1 ) # Trim final ']'
                        @log.debug " Found leading link %p" % link



                        # Markdown Extra: Footnote
                        if link =~ /^\^(.+)/ then
                                id = $1
                                if rs.footnotes[id] then
                                        rs.found_footnote_ids << id
                                        label = "[#{rs.found_footnote_ids.size}]"
                                else
                                        rs.warnings << "undefined footnote id - #{id}"
                                        label = '[?]'
                                end

                                text += %Q|<sup id="footnote-ref:#{id}"><a href="#footnote:#{id}" rel="footnote">#{label}</a></sup>|

                        # Look for a reference-style second part
                        elsif @scanner.scan( RefLinkIdRegexp )
                                linkid = @scanner[1]
                                linkid = link.dup if linkid.empty?
                                linkid.downcase!
                                @log.debug "  Found a linkid: %p" % linkid

                                # If there's a matching link in the link table, build an
                                # anchor tag for it.
                                if rs.urls.key?( linkid )
                                        @log.debug "   Found link key in the link table: %p" % rs.urls[linkid]
                                        url = escape_md( rs.urls[linkid] )

                                        text += %{<a href="#{url}"}
                                        if rs.titles.key?(linkid)
                                                text += %{ title="%s"} % escape_md( rs.titles[linkid] )
                                        end
                                        text += %{>#{link}</a>}

                                # If the link referred to doesn't exist, just append the raw
                                # source to the result
                                else
                                        @log.debug "  Linkid %p not found in link table" % linkid
                                        @log.debug "  Appending original string instead: "
                                        @log.debug "%p" % @scanner.string[ startpos-1 .. @scanner.pos-1 ]

                                        rs.warnings << "link-id not found - #{linkid}"
                                        text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
                                end

                        # ...or for an inline style second part
                        elsif @scanner.scan( InlineLinkRegexp )
                                url = @scanner[1]
                                title = @scanner[3]
                                @log.debug "  Found an inline link to %p" % url

                                url = "##{link}" if url == '#' # target anchor briefing (since AoBane 0.40)

                                text += %{<a href="%s"} % escape_md( url )
                                if title
                                        title.gsub!( /"/, "&quot;" )
                                        text += %{ title="%s"} % escape_md( title )
                                end
                                text += %{>#{link}</a>}

                        # No linkid part: just append the first part as-is.
                        else
                                @log.debug "No linkid, so no anchor. Appending literal text."
                                text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
                        end # if linkid

                # Plain text
                else
                        @log.debug " Scanning to the next link from %p" % @scanner.rest
                        text += @scanner.scan( /[^\[]+/ )
                end

        end # until @scanner.empty?

        return text
end
transform_block_quotes( str, rs ) click to toggle source

Transform Markdown-style blockquotes in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1572
def transform_block_quotes( str, rs )
        @log.debug " Transforming block quotes"

        str.gsub( BlockQuoteRegexp ) {|quote|
                @log.debug "Making blockquote from %p" % quote

                quote.gsub!( /^ *> ?/, '' ) # Trim one level of quoting
                quote.gsub!( /^ +$/, '' )   # Trim whitespace-only lines

                indent = " " * TabWidth
                quoted = %{<blockquote>\n%s\n</blockquote>\n\n} %
                        apply_block_transforms( quote, rs ).
                        gsub( /^/, indent ).
                        gsub( PreChunk ) {|m| m.gsub(/^#{indent}/o, '') }
                @log.debug "Blockquoted chunk is: %p" % quoted
                quoted
        }
end
transform_code_blocks( str, rs ) click to toggle source

Transform Markdown-style codeblocks in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1520
def transform_code_blocks( str, rs )
        @log.debug " Transforming code blocks"

        str.gsub( CodeBlockRegexp ) {|block|
                codeblock = $1
                remainder = $2


                tmpl = %{\n\n<pre><code>%s\n</code></pre>\n\n%s}

                # patch for ruby 1.9.1 bug
                if tmpl.respond_to?(:force_encoding) then
                        tmpl.force_encoding(str.encoding)
                end
                args = [ encode_code( outdent(codeblock), rs ).rstrip, remainder ]

                # recover all backslash escaped to original form
                EscapeTable.each {|char, hash|
                        args[0].gsub!( hash[:md5re]){char}
                }

                # Generate the codeblock
                tmpl % args
        }
end
transform_code_spans( str, rs ) click to toggle source

Transform backticked spans into <code> spans.

# File lib/AoBane.rb, line 1983
def transform_code_spans( str, rs )
        @log.debug " Transforming code spans"

        # Set up the string scanner and just return the string unless there's at
        # least one backtick.

        @scanner.string = str.dup
        unless @scanner.exist?( /`/ )
                @scanner.terminate
                @log.warn "No backticks found for code span in %p" % str
                return str
        end

        @log.debug "Transforming code spans in %p" % str
        # Build the transformed text anew
        text = ''
        
        # Scan to the end of the string
        until @scanner.empty?

                # Scan up to an opening backtick
                if pre = @scanner.scan_until( /.??(?=`)/m )
                        text += pre
                        @log.debug "Found backtick at %d after '...%s'" % [ @scanner.pos, text[-20, 20] ]

                        # Make a pattern to find the end of the span
                        opener = @scanner.scan( /`+/ )
                        len = opener.length
                        closer = Regexp::new( opener )
                        @log.debug "Scanning for end of code span with %p" % closer

                        # Scan until the end of the closing backtick sequence. Chop the
                        # backticks off the resultant string, strip leading and trailing
                        # whitespace, and encode any enitites contained in it.
                        codespan = @scanner.scan_until( closer ) or
                                raise FormatError::new( @scanner.rest[0,20],
                                        "No %p found before end" % opener )

                        @log.debug "Found close of code span at %d: %p" % [ @scanner.pos - len, codespan ]
                        #p codespan.strip
                        codespan.slice!( -len, len )
                        text += "<code>%s</code>" %
                                encode_code( codespan.strip, rs )
                       
                # If there's no more backticks, just append the rest of the string
                # and move the scan pointer to the end
                else
                        text += @scanner.rest
                        @scanner.terminate
                end
        end

        return text
end
transform_definition_list_items(str, rs) click to toggle source
# File lib/AoBane.rb, line 1414
        def transform_definition_list_items(str, rs)
                buf = Util.generate_blank_string_io(str)
                buf.puts %Q|<dl>|

                lines = str.split("\n")
                until lines.empty? do

                        dts = []

                        # get dt items
                        while lines.first =~ /^(?!\:).+$/ do
                                dts << lines.shift
                        end


                        dd_as_block = false

                        # skip blank lines
                        while not lines.empty? and lines.first.empty? do
                                lines.shift
                                dd_as_block = true
                        end


                        dds = []
                        while lines.first =~ DDLineRegexp do
                                dd_buf = []

                                # dd first line
                                unless (line = lines.shift).empty? then
                                        dd_buf << $1 << "\n"
                                end

                                # dd second and more lines (sequential with 1st-line)
                                until lines.empty? or                         # stop if read all
                                lines.first =~ /^[ ]{0,#{TabWidth - 1}}$/ or # stop if blank line
                                lines.first =~ DDLineRegexp do                # stop if new dd found
                                        dd_buf << outdent(lines.shift) << "\n"
                                end

                                # dd second and more lines (separated with 1st-line)
                                until lines.empty? do  # stop if all was read
                                        if lines.first.empty? then
                                                # blank line (skip)
                                                lines.shift
                                                dd_buf << "\n"
                                        elsif lines.first =~ /^[ ]{#{TabWidth},}/ then
                                                # indented body
                                                dd_buf << outdent(lines.shift) << "\n"
                                        else
                                                # not indented body
                                                break
                                        end

                                end


                                dds << dd_buf.join

                                # skip blank lines
                                unless lines.empty? then
                                        while lines.first.empty? do
                                                lines.shift
                                        end
                                end
                        end

                        # html output
                        dts.each do |dt|
                                buf.puts %Q|  <dt>#{apply_span_transforms(dt, rs)}</dt>|
                        end

                        dds.each do |dd|
                                if dd_as_block then
                                        buf.puts %Q|  <dd>#{apply_block_transforms(dd, rs)}</dd>|
                                else
                                        dd.gsub!(/\n+\z/, '') # chomp linefeeds
                                        buf.puts %Q|  <dd>#{apply_span_transforms(dd.chomp, rs)}</dd>|
                                end
                        end
                end

                buf.puts %Q|</dl>|

                return(buf.string)
        end

        # old


        # Pattern for matching codeblocks
        CodeBlockRegexp = %r{
                (?:\n\n|\A|\A\n)
                (                                                                    # $1 = the code block
                  (?:
                        (?:[ ]{#{TabWidth}} | \t)           # a tab or tab-width of spaces
                        .*\n+
                  )+
                )
                (^[ ]{0,#{TabWidth - 1}}\S|\Z)               # Lookahead for non-space at
                                                                                        # line-start, or end of doc
          }x


        ### Transform Markdown-style codeblocks in a copy of the specified +str+ and
        ### return it.
        def transform_code_blocks( str, rs )
                @log.debug " Transforming code blocks"

                str.gsub( CodeBlockRegexp ) {|block|
                        codeblock = $1
                        remainder = $2


                        tmpl = %{\n\n<pre><code>%s\n</code></pre>\n\n%s}

                        # patch for ruby 1.9.1 bug
                        if tmpl.respond_to?(:force_encoding) then
                                tmpl.force_encoding(str.encoding)
                        end
                        args = [ encode_code( outdent(codeblock), rs ).rstrip, remainder ]

                        # recover all backslash escaped to original form
                        EscapeTable.each {|char, hash|
                                args[0].gsub!( hash[:md5re]){char}
                        }

                        # Generate the codeblock
                        tmpl % args
                }
        end


        FencedCodeBlockRegexp = /^(\~{3,})\n((?m:.+?)\n)\1\n/

        def pretransform_fenced_code_blocks( str, rs )
                @log.debug " Transforming fenced code blocks => standard code blocks"

                str.gsub( FencedCodeBlockRegexp ) {|block|
                        "\n~\n\n" + indent($2) + "\n~\n\n"
                }
        end



        # Pattern for matching Markdown blockquote blocks
        BlockQuoteRegexp = %r{
                  (?:
                        ^[ ]*>[ ]?          # '>' at the start of a line
                          .+\n                      # rest of the first line
                        (?:.+\n)*           # subsequent consecutive lines
                        \n*                         # blanks
                  )+
          }x
        PreChunk = %r{ ( ^ \s* <pre> .+? </pre> ) }xm

        ### Transform Markdown-style blockquotes in a copy of the specified +str+
        ### and return it.
        def transform_block_quotes( str, rs )
                @log.debug " Transforming block quotes"

                str.gsub( BlockQuoteRegexp ) {|quote|
                        @log.debug "Making blockquote from %p" % quote

                        quote.gsub!( /^ *> ?/, '' ) # Trim one level of quoting
                        quote.gsub!( /^ +$/, '' )   # Trim whitespace-only lines

                        indent = " " * TabWidth
                        quoted = %{<blockquote>\n%s\n</blockquote>\n\n} %
                                apply_block_transforms( quote, rs ).
                                gsub( /^/, indent ).
                                gsub( PreChunk ) {|m| m.gsub(/^#{indent}/o, '') }
                        @log.debug "Blockquoted chunk is: %p" % quoted
                        quoted
                }
        end


        # AoBane change:
        #   allow loosely urls and addresses (BlueCloth is very strict)
        #
        # loose examples:
        #  <skype:tetra-dice>     (other protocol)
        #  <ema+il@example.com>     (ex: gmail alias)
        #
        # not adapted addresses:
        #  <"Abc@def"@example.com>  (refer to quoted-string of RFC 5321)


        AutoAnchorURLRegexp = /<(#{URI.regexp})>/ # $1 = url

        AutoAnchorEmailRegexp = /<([^'">\s]+?\@[^'">\s]+[.][a-zA-Z]+)>/ # $2 = address

        ### Transform URLs in a copy of the specified +str+ into links and return
        ### it.
        def transform_auto_links( str, rs )
                @log.debug " Transforming auto-links"
                str.gsub(AutoAnchorURLRegexp){
                        %|<a href="#{Util.escape_html($1)}">#{Util.escape_html($1)}</a>|
                }.gsub( AutoAnchorEmailRegexp ) {|addr|
                        encode_email_address( unescape_special_chars($1) )
                }
        end


        # Encoder functions to turn characters of an email address into encoded
        # entities.
        Encoders = [
                lambda {|char| "&#%03d;" % char},
                lambda {|char| "&#x%X;" % char},
                lambda {|char| char.chr },
        ]

        ### Transform a copy of the given email +addr+ into an escaped version safer
        ### for posting publicly.
        def encode_email_address( addr )

                rval = ''
                ("mailto:" + addr).each_byte {|b|
                        case b
                        when ?:
                                rval += ":"
                        when ?@
                                rval += Encoders[ rand(2) ][ b ]
                        else
                                r = rand(100)
                                rval += (
                                        r > 90 ? Encoders[2][ b ] :
                                        r < 45 ? Encoders[1][ b ] :
                                                         Encoders[0][ b ]
                                )
                        end
                }

                return %{<a href="%s">%s</a>} % [ rval, rval.sub(/.+?:/, '') ]
        end


        # Regexp for matching Setext-style headers
        SetextHeaderRegexp = %r{
                (.+?)                        # The title text ($1)

                (?: # Markdown Extra: Header Id Attribute (optional)
                        [ ]* # space after closing #'s
                        \{\#
                                (\S+?) # $2 = Id
                        \}
                        [ \t]* # allowed lazy spaces
                )?
                \n
                ([\-=])+             # Match a line of = or -. Save only one in $3.
                [ ]*\n+
           }x

        # Regexp for matching ATX-style headers
        AtxHeaderRegexp = %r{
                ^(\#+)       # $1 = string of #'s
                [ ]*
                (.+?)                # $2 = Header text
                [ ]*
                \#*                  # optional closing #'s (not counted)

                (?: # Markdown Extra: Header Id Attribute (optional)
                        [ ]* # space after closing #'s
                        \{\#
                                (\S+?) # $3 = Id
                        \}
                        [ \t]* # allowed lazy spaces
                )?

                \n+
          }x

        HeaderRegexp = Regexp.union(SetextHeaderRegexp, AtxHeaderRegexp)

        IdRegexp = /^[a-zA-Z][a-zA-Z0-9\:\._-]*$/

        ### Apply Markdown header transforms to a copy of the given +str+ amd render
        ### state +rs+ and return the result.
        def transform_headers( str, rs )
                @log.debug " Transforming headers"

                # Setext-style headers:
                #      Header 1
                #      ========
                #
                #      Header 2
                #      --------
                #

                section_numbers = [nil, nil, nil, nil, nil]

                str.
                        gsub( HeaderRegexp ) {|m|
                                if $1 then
                                        @log.debug "Found setext-style header"
                                        title, id, hdrchar = $1, $2, $3

                                        case hdrchar
                                        when '='
                                                level = 1
                                        when '-'
                                                level = 2
                                        end
                                else
                                        @log.debug "Found ATX-style header"
                                        hdrchars, title, id = $4, $5, $6
                                        level = hdrchars.length

                                        if level >= 7 then
                                                rs.warnings << "illegal header level - h#{level} ('#' symbols are too many)"
                                        end
                                end

                                prefix = ''
                                if rs.numbering? then
                                        if level >= rs.numbering_start_level and level <= 6 then
                                                depth = level - rs.numbering_start_level

                                                section_numbers.each_index do |i|
                                                        if i == depth and section_numbers[depth] then
                                                                # increment a deepest number if current header's level equals last header's
                                                                section_numbers[i] += 1
                                                        elsif i <= depth then
                                                                # set default number if nil
                                                                section_numbers[i] ||= 1
                                                        else
                                                                # clear discardeds
                                                                section_numbers[i] = nil
                                                        end
                                                end

                                                no = ''
                                                (0..depth).each do |i|
                                                        no << "#{section_numbers[i]}."
                                                end

                                                prefix = "#{no} "
                                        end
                                end

                                title_html = apply_span_transforms( title, rs )

                                unless id then
                                        case rs.header_id_type
                                        when HeaderIDType::ESCAPE
                                                id = escape_to_header_id(title_html)
                                                if rs.headers.find{|h| h.id == id} then
                                                        rs.warnings << "header id collision - #{id}"
                                                        id = "bfheader-#{Digest::MD5.hexdigest(title)}"
                                                end
                                        else
                                                id = "bfheader-#{Digest::MD5.hexdigest(title)}"
                                        end
                                end

                                title = "#{prefix}#{title}"
                                title_html = "#{prefix}#{title_html}"


                                unless id =~ IdRegexp then
                                        rs.warnings << "illegal header id - #{id} (legal chars: [a-zA-Z0-9_-.] | 1st: [a-zA-Z])"
                                end

                                if rs.block_transform_depth == 1 then
                                        rs.headers << RenderState::Header.new(id, level, title, title_html)
                                end

                                if @use_header_id then
                                        %{<h%d id="%s">%s</h%d>\n\n} % [ level, id, title_html, level ]
                                else
                                        %{<h%d>%s</h%d>\n\n} % [ level, title_html, level ]
                                end
                        }
        end


        ### Wrap all remaining paragraph-looking text in a copy of +str+ inside <p>
        ### tags and return it.
        def form_paragraphs( str, rs )
                @log.debug " Forming paragraphs"
                grafs = str.
                        sub( /\A\n+/, '' ).
                        sub( /\n+\z/, '' ).
                        split( /\n{2,}/ )

                rval = grafs.collect {|graf|

                        # Unhashify HTML blocks if this is a placeholder
                        if rs.html_blocks.key?( graf )
                                rs.html_blocks[ graf ]

                        # no output if this is block separater
                        elsif graf == '~' then
                                ''

                        # Otherwise, wrap in <p> tags
                        else
                                apply_span_transforms(graf, rs).
                                        sub( /^[ ]*/, '<p>' ) + '</p>'
                        end
                }.join( "\n\n" )

                @log.debug " Formed paragraphs: %p" % rval
                return rval
        end


        # Pattern to match the linkid part of an anchor tag for reference-style
        # links.
        RefLinkIdRegexp = %r{
                [ ]?                                 # Optional leading space
                (?:\n[ ]*)?                          # Optional newline + spaces
                \[
                        (.*?)                               # Id = $1
                \]
          }x

        InlineLinkRegexp = %r{
                \(                                           # Literal paren
                        [ ]*                                # Zero or more spaces
                        <?(.+?)>?                   # URI = $1
                        [ ]*                                # Zero or more spaces
                        (?:                                 #
                                ([\"\'])           # Opening quote char = $2
                                (.*?)                      # Title = $3
                                \2                         # Matching quote char
                        )?                                  # Title is optional
                \)
          }x

        ### Apply Markdown anchor transforms to a copy of the specified +str+ with
        ### the given render state +rs+ and return it.
        def transform_anchors( str, rs )
                @log.debug " Transforming anchors"
                @scanner.string = str.dup
                text = ''

                # Scan the whole string
                until @scanner.empty?

                        if @scanner.scan( /\[/ )
                                link = ''; linkid = ''
                                depth = 1
                                startpos = @scanner.pos
                                @log.debug " Found a bracket-open at %d" % startpos

                                # Scan the rest of the tag, allowing unlimited nested []s. If
                                # the scanner runs out of text before the opening bracket is
                                # closed, append the text and return (wasn't a valid anchor).
                                while depth.nonzero?
                                        linktext = @scanner.scan_until( /\]|\[/ )

                                        if linktext
                                                @log.debug "  Found a bracket at depth %d: %p" % [ depth, linktext ]
                                                link += linktext

                                                # Decrement depth for each closing bracket
                                                depth += ( linktext[-1, 1] == ']' ? -1 : 1 )
                                                @log.debug "  Depth is now #{depth}"

                                        # If there's no more brackets, it must not be an anchor, so
                                        # just abort.
                                        else
                                                @log.debug "  Missing closing brace, assuming non-link."
                                                link += @scanner.rest
                                                @scanner.terminate
                                                return text + '[' + link
                                        end
                                end
                                link.slice!( -1 ) # Trim final ']'
                                @log.debug " Found leading link %p" % link



                                # Markdown Extra: Footnote
                                if link =~ /^\^(.+)/ then
                                        id = $1
                                        if rs.footnotes[id] then
                                                rs.found_footnote_ids << id
                                                label = "[#{rs.found_footnote_ids.size}]"
                                        else
                                                rs.warnings << "undefined footnote id - #{id}"
                                                label = '[?]'
                                        end

                                        text += %Q|<sup id="footnote-ref:#{id}"><a href="#footnote:#{id}" rel="footnote">#{label}</a></sup>|

                                # Look for a reference-style second part
                                elsif @scanner.scan( RefLinkIdRegexp )
                                        linkid = @scanner[1]
                                        linkid = link.dup if linkid.empty?
                                        linkid.downcase!
                                        @log.debug "  Found a linkid: %p" % linkid

                                        # If there's a matching link in the link table, build an
                                        # anchor tag for it.
                                        if rs.urls.key?( linkid )
                                                @log.debug "   Found link key in the link table: %p" % rs.urls[linkid]
                                                url = escape_md( rs.urls[linkid] )

                                                text += %{<a href="#{url}"}
                                                if rs.titles.key?(linkid)
                                                        text += %{ title="%s"} % escape_md( rs.titles[linkid] )
                                                end
                                                text += %{>#{link}</a>}

                                        # If the link referred to doesn't exist, just append the raw
                                        # source to the result
                                        else
                                                @log.debug "  Linkid %p not found in link table" % linkid
                                                @log.debug "  Appending original string instead: "
                                                @log.debug "%p" % @scanner.string[ startpos-1 .. @scanner.pos-1 ]

                                                rs.warnings << "link-id not found - #{linkid}"
                                                text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
                                        end

                                # ...or for an inline style second part
                                elsif @scanner.scan( InlineLinkRegexp )
                                        url = @scanner[1]
                                        title = @scanner[3]
                                        @log.debug "  Found an inline link to %p" % url

                                        url = "##{link}" if url == '#' # target anchor briefing (since AoBane 0.40)

                                        text += %{<a href="%s"} % escape_md( url )
                                        if title
                                                title.gsub!( /"/, "&quot;" )
                                                text += %{ title="%s"} % escape_md( title )
                                        end
                                        text += %{>#{link}</a>}

                                # No linkid part: just append the first part as-is.
                                else
                                        @log.debug "No linkid, so no anchor. Appending literal text."
                                        text += @scanner.string[ startpos-1 .. @scanner.pos-1 ]
                                end # if linkid

                        # Plain text
                        else
                                @log.debug " Scanning to the next link from %p" % @scanner.rest
                                text += @scanner.scan( /[^\[]+/ )
                        end

                end # until @scanner.empty?

                return text
        end


        # Pattern to match strong emphasis in Markdown text
        BoldRegexp = %r{ (\*\*|__) (\S|\S.*?\S) \1 }x

        # Pattern to match normal emphasis in Markdown text
        ItalicRegexp = %r{ (\*|_) (\S|\S.*?\S) \1 }x

        ### Transform italic- and bold-encoded text in a copy of the specified +str+
        ### and return it.
        def transform_italic_and_bold( str, rs )
                @log.debug " Transforming italic and bold"

                str.
                        gsub( BoldRegexp, %{<strong>\\2</strong>} ).
                        gsub( ItalicRegexp, %{<em>\\2</em>} )
        end


        ### Transform backticked spans into <code> spans.
        def transform_code_spans( str, rs )
                @log.debug " Transforming code spans"

                # Set up the string scanner and just return the string unless there's at
                # least one backtick.

                @scanner.string = str.dup
                unless @scanner.exist?( /`/ )
                        @scanner.terminate
                        @log.warn "No backticks found for code span in %p" % str
                        return str
                end

                @log.debug "Transforming code spans in %p" % str
                # Build the transformed text anew
                text = ''
                
                # Scan to the end of the string
                until @scanner.empty?

                        # Scan up to an opening backtick
                        if pre = @scanner.scan_until( /.??(?=`)/m )
                                text += pre
                                @log.debug "Found backtick at %d after '...%s'" % [ @scanner.pos, text[-20, 20] ]

                                # Make a pattern to find the end of the span
                                opener = @scanner.scan( /`+/ )
                                len = opener.length
                                closer = Regexp::new( opener )
                                @log.debug "Scanning for end of code span with %p" % closer

                                # Scan until the end of the closing backtick sequence. Chop the
                                # backticks off the resultant string, strip leading and trailing
                                # whitespace, and encode any enitites contained in it.
                                codespan = @scanner.scan_until( closer ) or
                                        raise FormatError::new( @scanner.rest[0,20],
                                                "No %p found before end" % opener )

                                @log.debug "Found close of code span at %d: %p" % [ @scanner.pos - len, codespan ]
                                #p codespan.strip
                                codespan.slice!( -len, len )
                                text += "<code>%s</code>" %
                                        encode_code( codespan.strip, rs )
                               
                        # If there's no more backticks, just append the rest of the string
                        # and move the scan pointer to the end
                        else
                                text += @scanner.rest
                                @scanner.terminate
                        end
                end

                return text
        end


        # Next, handle inline images:  ![alt text](url "optional title")
        # Don't forget: encode * and _
        InlineImageRegexp = %r{
                (                                    # Whole match = $1
                        !\[ (.*?) \]        # alt text = $2
                  \([ ]*
                        <?(\S+?)>?          # source url = $3
                    [ ]*
                        (?:                         #
                          (["'])            # quote char = $4
                          (.*?)             # title = $5
                          \4                        # matching quote
                          [ ]*
                        )?                          # title is optional
                  \)
                )
          }x #"


        # Reference-style images
        ReferenceImageRegexp = %r{
                (                                    # Whole match = $1
                        !\[ (.*?) \]        # Alt text = $2
                        [ ]?                        # Optional space
                        (?:\n[ ]*)?         # One optional newline + spaces
                        \[ (.*?) \]         # id = $3
                )
          }x

        ### Turn image markup into image tags.
        def transform_images( str, rs )
                @log.debug " Transforming images %p" % str

                # Handle reference-style labeled images: ![alt text][id]
                str.
                        gsub( ReferenceImageRegexp ) {|match|
                                whole, alt, linkid = $1, $2, $3.downcase
                                @log.debug "Matched %p" % match
                                res = nil
                                alt.gsub!( /"/, '&quot;' )

                                # for shortcut links like ![this][].
                                linkid = alt.downcase if linkid.empty?

                                if rs.urls.key?( linkid )
                                        url = escape_md( rs.urls[linkid] )
                                        @log.debug "Found url '%s' for linkid '%s' " % [ url, linkid ]

                                        # Build the tag
                                        result = %{<img src="%s" alt="%s"} % [ url, alt ]
                                        if rs.titles.key?( linkid )
                                                result += %{ title="%s"} % escape_md( rs.titles[linkid] )
                                        end
                                        result += EmptyElementSuffix

                                else
                                        result = whole
                                end

                                @log.debug "Replacing %p with %p" % [ match, result ]
                                result
                        }.

                        # Inline image style
                        gsub( InlineImageRegexp ) {|match|
                                @log.debug "Found inline image %p" % match
                                whole, alt, title = $1, $2, $5
                                url = escape_md( $3 )
                                alt.gsub!( /"/, '&quot;' )

                                # Build the tag
                                result = %{<img src="%s" alt="%s"} % [ url, alt ]
                                unless title.nil?
                                        title.gsub!( /"/, '&quot;' )
                                        result += %{ title="%s"} % escape_md( title )
                                end
                                result += EmptyElementSuffix

                                @log.debug "Replacing %p with %p" % [ match, result ]
                                result
                        }
        end


        # Regexp to match special characters in a code block
        CodeEscapeRegexp = %r{( \* | _ | \{ | \} | \[ | \] | \\ )}x

        ### Escape any characters special to HTML and encode any characters special
        ### to Markdown in a copy of the given +str+ and return it.
        def encode_code( str, rs )
                #str.gsub( %r{&}, '&amp;' ).
                        #gsub( %r{<}, '&lt;' ).
                        #gsub( %r{>}, '&gt;' ).
                        #gsub( CodeEscapeRegexp ) {|match| EscapeTable[match][:md5]}
          return str
        end

        def escape_to_header_id(str)
                URI.escape(escape_md(str.gsub(/<\/?[^>]*>/, "").gsub(/\s/, "_")).gsub("/", ".2F")).gsub("%", ".")
        end

        #################################################################
        ###   U T I L I T Y     F U N C T I O N S
        #################################################################

        ### Escape any markdown characters in a copy of the given +str+ and return
        ### it.
        def escape_md( str )
                str.
                        gsub( /\*|_/ ){|symbol| EscapeTable[symbol][:md5]}
        end


        # Matching constructs for tokenizing X/HTML
        HTMLCommentRegexp  = %r{ <! ( -- .*? -- \s* )+ > }mx
        XMLProcInstRegexp  = %r{ <\? .*? \?> }mx
        MetaTag = Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp )

        HTMLTagOpenRegexp  = %r{ < [a-z/!$] [^<>]* }imx
        HTMLTagCloseRegexp = %r{ > }x
        HTMLTagPart = Regexp::union( HTMLTagOpenRegexp, HTMLTagCloseRegexp )

        ### Break the HTML source in +str+ into a series of tokens and return
        ### them. The tokens are just 2-element Array tuples with a type and the
        ### actual content. If this function is called with a block, the type and
        ### text parts of each token will be yielded to it one at a time as they are
        ### extracted.
        def tokenize_html( str )
                depth = 0
                tokens = []
                @scanner.string = str.dup
                type, token = nil, nil

                until @scanner.empty?
                        @log.debug "Scanning from %p" % @scanner.rest

                        # Match comments and PIs without nesting
                        if (( token = @scanner.scan(MetaTag) ))
                                type = :tag

                        # Do nested matching for HTML tags
                        elsif (( token = @scanner.scan(HTMLTagOpenRegexp) ))
                                tagstart = @scanner.pos
                                @log.debug " Found the start of a plain tag at %d" % tagstart

                                # Start the token with the opening angle
                                depth = 1
                                type = :tag

                                # Scan the rest of the tag, allowing unlimited nested <>s. If
                                # the scanner runs out of text before the tag is closed, raise
                                # an error.
                                while depth.nonzero?

                                        # Scan either an opener or a closer
                                        chunk = @scanner.scan( HTMLTagPart ) or
                                                break # AoBane Fix (refer to spec/code-block.rb)

                                        @log.debug "  Found another part of the tag at depth %d: %p" % [ depth, chunk ]

                                        token += chunk

                                        # If the last character of the token so far is a closing
                                        # angle bracket, decrement the depth. Otherwise increment
                                        # it for a nested tag.
                                        depth += ( token[-1, 1] == '>' ? -1 : 1 )
                                        @log.debug "  Depth is now #{depth}"
                                end

                        # Match text segments
                        else
                                @log.debug " Looking for a chunk of text"
                                type = :text

                                # Scan forward, always matching at least one character to move
                                # the pointer beyond any non-tag '<'.
                                token = @scanner.scan_until( /[^<]+/m )
                        end

                        @log.debug " type: %p, token: %p" % [ type, token ]

                        # If a block is given, feed it one token at a time. Add the token to
                        # the token list to be returned regardless.
                        if block_given?
                                yield( type, token )
                        end
                        tokens << [ type, token ]
                end

                return tokens
        end


        ### Return a copy of +str+ with angle brackets and ampersands HTML-encoded.
        def encode_html( str )
                #str.gsub( /&(?!#?[x]?(?:[0-9a-f]+|\w+);)/i, "&amp;" ).
                        #gsub( %r{<(?![a-z/?\$!])}i, "&lt;" )
                        return str
        end


        ### Return one level of line-leading tabs or spaces from a copy of +str+ and
        ### return it.
        def outdent( str )
                str.gsub( /^(\t|[ ]{1,#{TabWidth}})/, '')
        end

        def indent(str)
                str.gsub( /^/, ' ' * TabWidth)
        end

end
transform_definition_lists(str, rs) click to toggle source
# File lib/AoBane.rb, line 1403
def transform_definition_lists(str, rs)
        @log.debug " Transforming definition lists at %p" % (str[0,100] + '...')
        str.gsub( DefinitionListRegexp ) {|list|
                @log.debug "  Found definition list %p (captures=%p)" % [list, $~.captures]
                transform_definition_list_items(list, rs)
        }
end
transform_headers( str, rs ) click to toggle source

Apply Markdown header transforms to a copy of the given str amd render state rs and return the result.

# File lib/AoBane.rb, line 1693
def transform_headers( str, rs )
        @log.debug " Transforming headers"

        # Setext-style headers:
        #      Header 1
        #      ========
        #
        #      Header 2
        #      --------
        #

        section_numbers = [nil, nil, nil, nil, nil]

        str.
                gsub( HeaderRegexp ) {|m|
                        if $1 then
                                @log.debug "Found setext-style header"
                                title, id, hdrchar = $1, $2, $3

                                case hdrchar
                                when '='
                                        level = 1
                                when '-'
                                        level = 2
                                end
                        else
                                @log.debug "Found ATX-style header"
                                hdrchars, title, id = $4, $5, $6
                                level = hdrchars.length

                                if level >= 7 then
                                        rs.warnings << "illegal header level - h#{level} ('#' symbols are too many)"
                                end
                        end

                        prefix = ''
                        if rs.numbering? then
                                if level >= rs.numbering_start_level and level <= 6 then
                                        depth = level - rs.numbering_start_level

                                        section_numbers.each_index do |i|
                                                if i == depth and section_numbers[depth] then
                                                        # increment a deepest number if current header's level equals last header's
                                                        section_numbers[i] += 1
                                                elsif i <= depth then
                                                        # set default number if nil
                                                        section_numbers[i] ||= 1
                                                else
                                                        # clear discardeds
                                                        section_numbers[i] = nil
                                                end
                                        end

                                        no = ''
                                        (0..depth).each do |i|
                                                no << "#{section_numbers[i]}."
                                        end

                                        prefix = "#{no} "
                                end
                        end

                        title_html = apply_span_transforms( title, rs )

                        unless id then
                                case rs.header_id_type
                                when HeaderIDType::ESCAPE
                                        id = escape_to_header_id(title_html)
                                        if rs.headers.find{|h| h.id == id} then
                                                rs.warnings << "header id collision - #{id}"
                                                id = "bfheader-#{Digest::MD5.hexdigest(title)}"
                                        end
                                else
                                        id = "bfheader-#{Digest::MD5.hexdigest(title)}"
                                end
                        end

                        title = "#{prefix}#{title}"
                        title_html = "#{prefix}#{title_html}"


                        unless id =~ IdRegexp then
                                rs.warnings << "illegal header id - #{id} (legal chars: [a-zA-Z0-9_-.] | 1st: [a-zA-Z])"
                        end

                        if rs.block_transform_depth == 1 then
                                rs.headers << RenderState::Header.new(id, level, title, title_html)
                        end

                        if @use_header_id then
                                %{<h%d id="%s">%s</h%d>\n\n} % [ level, id, title_html, level ]
                        else
                                %{<h%d>%s</h%d>\n\n} % [ level, title_html, level ]
                        end
                }
end
transform_hrules( str, rs ) click to toggle source

Transform any Markdown-style horizontal rules in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1301
def transform_hrules( str, rs )
        @log.debug " Transforming horizontal rules"
        str.gsub( /^( ?[\-\*_] ?){3,}$/, "\n<hr#{EmptyElementSuffix}\n" )
end
transform_images( str, rs ) click to toggle source

Turn image markup into image tags.

# File lib/AoBane.rb, line 2069
def transform_images( str, rs )
        @log.debug " Transforming images %p" % str

        # Handle reference-style labeled images: ![alt text][id]
        str.
                gsub( ReferenceImageRegexp ) {|match|
                        whole, alt, linkid = $1, $2, $3.downcase
                        @log.debug "Matched %p" % match
                        res = nil
                        alt.gsub!( /"/, '&quot;' )

                        # for shortcut links like ![this][].
                        linkid = alt.downcase if linkid.empty?

                        if rs.urls.key?( linkid )
                                url = escape_md( rs.urls[linkid] )
                                @log.debug "Found url '%s' for linkid '%s' " % [ url, linkid ]

                                # Build the tag
                                result = %{<img src="%s" alt="%s"} % [ url, alt ]
                                if rs.titles.key?( linkid )
                                        result += %{ title="%s"} % escape_md( rs.titles[linkid] )
                                end
                                result += EmptyElementSuffix

                        else
                                result = whole
                        end

                        @log.debug "Replacing %p with %p" % [ match, result ]
                        result
                }.

                # Inline image style
                gsub( InlineImageRegexp ) {|match|
                        @log.debug "Found inline image %p" % match
                        whole, alt, title = $1, $2, $5
                        url = escape_md( $3 )
                        alt.gsub!( /"/, '&quot;' )

                        # Build the tag
                        result = %{<img src="%s" alt="%s"} % [ url, alt ]
                        unless title.nil?
                                title.gsub!( /"/, '&quot;' )
                                result += %{ title="%s"} % escape_md( title )
                        end
                        result += EmptyElementSuffix

                        @log.debug "Replacing %p with %p" % [ match, result ]
                        result
                }
end
transform_italic_and_bold( str, rs ) click to toggle source

Transform italic- and bold-encoded text in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1973
def transform_italic_and_bold( str, rs )
        @log.debug " Transforming italic and bold"

        str.
                gsub( BoldRegexp, %{<strong>\\2</strong>} ).
                gsub( ItalicRegexp, %{<em>\\2</em>} )
end
transform_list_items( str, rs ) click to toggle source

Transform list items in a copy of the given str and return it.

# File lib/AoBane.rb, line 1360
def transform_list_items( str, rs )
        @log.debug " Transforming list items"

        # Trim trailing blank lines
        str = str.sub( /\n{2,}\z/, "\n" )
        str.gsub( ListItemRegexp ) {|line|
                @log.debug "  Found item line %p" % line
                leading_line, item = $1, $4
                separating_lines = $5

                if leading_line or /\n{2,}/.match(item) or not separating_lines.empty? then
                        @log.debug "   Found leading line or item has a blank"
                        item = apply_block_transforms( outdent(item), rs )
                else
                        # Recursion for sub-lists
                        @log.debug "   Recursing for sublist"
                        item = transform_lists( outdent(item), rs ).chomp
                        item = apply_span_transforms( item, rs )
                end

                %{<li>%s</li>\n} % item
        }
end
transform_lists( str, rs ) click to toggle source

Transform Markdown-style lists in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1333
def transform_lists( str, rs )
        @log.debug " Transforming lists at %p" % (str[0,100] + '...')

        str.gsub( ListRegexp ) {|list|
                @log.debug "  Found list %p" % list
                bullet = $1
                list_type = (ListMarkerUl.match(bullet) ? "ul" : "ol")

                %{<%s>\n%s</%s>\n} % [
                        list_type,
                        transform_list_items( list, rs ),
                        list_type,
                ]
        }
end
transform_table_rows(str, rs) click to toggle source
# File lib/AoBane.rb, line 1218
def transform_table_rows(str, rs)
  # split cells to 2-d array
  data = str.split("\n").map{|x| x.split('|')}
  caption = ''  #Inserted by set.minami 2013-04-20
  captionName = ''
  if /#{CaptionRegexp}/ =~ data[0].first then
    caption = if $1.nil? then '' else $1 end
    captionName = if $3.nil? then '' else $3 end
    data.shift
  end   #Inserted by set.minami 2013-04-20
  data.each do |row|
    if row.first.nil? then next end
    # cut left space
    row.first.lstrip! 
    
    # cut when optional side-borders is included
    row.shift if row.first.empty?
  end
  
  column_attrs = []
  
  re = ''                        
  re << if captionName == '' then 
          "<table>\n" 
        else 
          "<table id=\"#{captionName}\">\n"
        end
  re << "<caption>#{caption}</caption>\n" 
  #Insert by set.minami 2013-04-20
  
  # head is exist?
  
  #if !data[1].nil? && data[1].last =~ /\s+/ then
  ###  p data
  #  data.each{|d|
  #    d.pop
  #  }
  #end #Insert by set.minami @ 2013-04-29
  if data.size >= 3 and data[1].all?{|x| x =~ TableSeparatorCellRegexp} then
    head_row = data.shift
    separator_row = data.shift
    
    separator_row.each do |cell|
      cell.match TableSeparatorCellRegexp
      left = $1; right = $2
      
      if left and right then
        column_attrs << ' style="text-align: center"'
      elsif right then
        column_attrs << ' style="text-align: right"'
      elsif left then
        column_attrs << ' style="text-align: left"'
      else
        column_attrs << ''
      end
    end
    
    re << "\t<thead><tr>\n"
    head_row.each_with_index do |cell, i|
      re << "\t\t<th#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</th>\n"
    end
    re << "\t</tr></thead>\n"
  end
  
  # data row
  re << "\t<tbody>\n"
  data.each do |row|
    re << "\t\t<tr>\n"
    row.each_with_index do |cell, i|
      re << "\t\t\t<td#{column_attrs[i]}>#{apply_span_transforms(cell.strip, rs)}</td>\n"
    end
    re << "\t\t</tr>\n"
  end
  re << "\t</tbody>\n"
  
  re << "</table>\n"
  
  re
end
transform_tables(str, rs) click to toggle source

Transform tables.

# File lib/AoBane.rb, line 1200
def transform_tables(str, rs)
  str.gsub(TableRegexp){
    transform_table_rows($~[0], rs)
  }
end
transform_toc( str, rs ) click to toggle source

Transform any Markdown-style horizontal rules in a copy of the specified str and return it.

# File lib/AoBane.rb, line 1139
def transform_toc( str, rs )
        @log.debug " Transforming tables of contents"
        str.gsub(TOCRegexp){
                start_level = 2 # default
                end_level = 6

                param = $1
                if param then
                        if param =~ TOCStartLevelRegexp then
                                if !($1) and !($2) then
                                        rs.warnings << "illegal TOC parameter - #{param} (valid example: 'h2..h4')"
                                else
                                        start_level = ($1 ? $1.to_i : 2)
                                        end_level = ($2 ? $2.to_i : 6)
                                end
                        else
                                rs.warnings << "illegal TOC parameter - #{param} (valid example: 'h2..h4')"
                        end
                end

                if rs.headers.first and rs.headers.first.level >= (start_level + 1) then
                        rs.warnings << "illegal structure of headers - h#{start_level} should be set before h#{rs.headers.first.level}"
                end


                ul_text = "\n\n"
                rs.headers.each do |header|
                        if header.level >= start_level and header.level <= end_level then
                                ul_text << ' ' * TabWidth * (header.level - start_level)
                                ul_text << '* '
                                ul_text << %Q|<a href="##{header.id}" rel="toc">#{header.content_html}</a>|
                                ul_text << "\n"
                        end
                end
                ul_text << "\n"

                ul_text # output

        }
end
unescape_special_chars( str ) click to toggle source

Swap escaped special characters in a copy of the given str and return it.

# File lib/AoBane.rb, line 1068
def unescape_special_chars( str )
        EscapeTable.each {|char, hash|
                @log.debug "Unescaping escaped %p with %p" % [ char, hash[:md5re] ]
                str.gsub!( hash[:md5re], hash[:unescape] )
        }

        return str
end