class Slaw::ActGenerator
Base class for generating Act documents
Attributes
builder[RW]
Slaw::Parse::Builder
-
builder used by the generator
parser[RW]
- Treetop::Runtime::CompiledParser
-
compiled parser
Public Class Methods
new(grammar)
click to toggle source
# File lib/slaw/generator.rb, line 15 def initialize(grammar) @grammar = grammar @parser = build_parser @builder = Slaw::Parse::Builder.new(parser: @parser) @parser = @builder.parser @cleanser = Slaw::Parse::Cleanser.new end
Public Instance Methods
build_parser()
click to toggle source
# File lib/slaw/generator.rb, line 24 def build_parser unless @@parsers[@grammar] # load the grammar with polyglot and treetop # this will ensure the class below is available # see: http://cjheath.github.io/treetop/using_in_ruby.html require "slaw/grammars/#{@grammar}/act" grammar_class = "Slaw::Grammars::#{@grammar.upcase}::ActParser" @@parsers[@grammar] = eval(grammar_class) end @parser = @@parsers[@grammar].new @parser.root = :act @parser end
cleanup(text)
click to toggle source
Run basic cleanup on text, such as ensuring clean newlines and removing tabs. This is always automatically done before processing.
# File lib/slaw/generator.rb, line 52 def cleanup(text) @cleanser.cleanup(text) end
generate_from_text(text)
click to toggle source
Generate a Slaw::Act instance from plain text.
@param text [String] plain text
@return [Nokogiri::Document] the resulting xml
# File lib/slaw/generator.rb, line 45 def generate_from_text(text) @builder.parse_and_process_text(cleanup(text)) end
guess_section_number_after_title(text)
click to toggle source
Try to determine if section numbers come after titles, rather than before.
eg:
Section title 1. Section content
versus
1. Section title Section content
# File lib/slaw/generator.rb, line 75 def guess_section_number_after_title(text) before = text.scan(/^\w{4,}[^\n]+\n\d+\. /).length after = text.scan(/^\s*\n\d+\. \w{4,}/).length before > after * 1.25 end
reformat(text)
click to toggle source
Reformat some common errors in text to help make parsing more successful. Option and only recommended when processing a document for the first time.
# File lib/slaw/generator.rb, line 59 def reformat(text) @cleanser.reformat(text) end
text_from_act(doc)
click to toggle source
Transform an Akoma Ntoso XML document back into a plain-text version suitable for re-parsing back into XML with no loss of structure.
# File lib/slaw/generator.rb, line 84 def text_from_act(doc) # look on the load path for an XSL file for this grammar filename = "/slaw/grammars/#{@grammar}/act_text.xsl" if dir = $LOAD_PATH.find { |p| File.exist?(p + filename) } xslt = Nokogiri::XSLT(File.read(dir + filename)) xslt.apply_to(doc).gsub(/^( *\n){2,}/, "\n") else raise "Unable to find text XSL for grammar #{@grammar}: #{fragment}" end end