class PandocBeautifier
This class provides the major functionalites
Note that it is called PandocBeautifier
for historical reasons
provides methods to Process a pandoc file
Attributes
Public Class Methods
the constructor @param [Logger] logger logger object to be applied.
if none is specified, a default logger will be implemented
# File lib/wortsammler/class.proolib.rb, line 453 def initialize(logger = nil) @markdown_output_switches = %w{ +backtick_code_blocks -fenced_code_blocks +compact_definition_lists +space_in_atx_header +yaml_metadata_block }.join() @markdown_input_switches = %w{ +smart +backtick_code_blocks +fenced_code_blocks +compact_definition_lists -space_in_atx_header }.join() @view_pattern = /~~ED((\s*(\w+))*)~~/ # @view_pattern = /<\?ED((\s*(\w+))*)\?>/ @tempdir = Dir.mktmpdir @config = ProoConfig.new() @log=logger || $logger || nil if @log == nil @log = Logger.new(STDOUT) @log.level = Logger::INFO @log.datetime_format = "%Y-%m-%d %H:%M:%S" @log.formatter = proc do |severity, datetime, progname, msg| "#{datetime}: #{msg}\n" end end end
Public Instance Methods
perform the beautify
-
process the file with pandoc
-
revoke some quotes introduced by pandoc
@param [String] file the name of the file to be beautified
# File lib/wortsammler/class.proolib.rb, line 521 def beautify(file) @log.debug(" Cleaning: \"#{file}\"") docfile = File.new(file) olddoc = docfile.readlines.join docfile.close # process the file in pandoc cmd = "#{PANDOC_EXE} --standalone #{file.esc} -f markdown#{@markdown_input_switches} -t markdown#{@markdown_output_switches} --atx-headers --id-prefix=#{File.basename(file).esc}_ " newdoc = `#{cmd}` @log.debug "beautify #{file.esc}: #{$?}" @log.debug(" finished: \"#{file}\"") # tweak the quoting if $?.success? then # (RS_Mdc) # TODO: fix Table width toggles sometimes if (not olddoc == newdoc) then ##only touch the file if it is really changed File.open(file, "w") { |f| f.puts(newdoc) } File.open(file+".bak", "w") { |f| f.puts(olddoc) } # (RS_Mdc_) # remove this if needed @log.debug(" cleaned: \"#{file}\"") else @log.debug("was clean: \"#{file}\"") end #TODO: error handling here else @log.error("error calling pandoc - please watch the screen output") end end
@return [boolean] true if an appropriate version is available
# File lib/wortsammler/class.proolib.rb, line 500 def check_pandoc_version required_version_string="2.0.5" begin pandoc_version=`#{PANDOC_EXE} -v`.split("\n").first.split(" ")[1] if pandoc_version < required_version_string then @log.error "found pandoc #{pandoc_version} need #{required_version_string}" result = false else result = true end rescue Exception => e @log.error("could not run pandoc: #{e.message}") result=false end result end
This compiles the input documents to one single file it also beautifies the input files
@param [Array of String] input - the input files to be processed in the given sequence @param [String] output - the the name of the output file
# File lib/wortsammler/class.proolib.rb, line 679 def collect_document(input, output) inputs =input.map { |xx| xx.esc.to_osPath }.join(" ") # qoute cond combine the inputs inputname=File.basename(input.first) #now combine the input files @log.debug("combining the input files #{inputname} et al") cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} --standalone -t markdown#{@markdown_output_switches} -o #{output} --ascii #{inputs}" # note that inputs is already quoted system(cmd) if $?.success? then PandocBeautifier.new().beautify(output) end end
This filters the document according to the target audience
@param [String] inputfile name of inputfile @param [String] outputfile name of outputfile @param [String] view - name of intended view
# File lib/wortsammler/class.proolib.rb, line 622 def filter_document_variant(inputfile, outputfile, view) input_data = File.open(inputfile) { |f| f.readlines } output_data = Array.new is_active = true input_data.each { |l| switch=self.get_filter_command(l, view) l.gsub!(@view_pattern, "") is_active = switch unless switch.nil? @log.debug "select edtiion #{view}: #{is_active}: #{l.strip}" output_data << l if is_active } File.open(outputfile, "w") { |f| f.puts output_data.join } end
This generates the final document
It actually does this in two steps:
-
process front matter to laTeX
-
process documents
@param [Array of String] input the input files to be processed in the given sequence @param [String] outdir the output directory @param [String] outname the base name of the output file. It is a basename in case the
output format requires multiple files
@param [Array of String] format list of formats which shall be generated.
supported formats: "pdf", "latex", "html", "docx", "rtf", txt
@param [Hash] vars - the variables passed to pandoc @param [Hash] editions - the editions to process; default nil - no edition processing @param [Array of String] snippetfiles the list of files containing snippets @param [String] frontmatter file path to frontmatter the file to processed as frontmatter @param [ProoConfig] config - the configuration file to be used
# File lib/wortsammler/class.proolib.rb, line 744 def generateDocument(input, outdir, outname, format, vars, editions=nil, snippetfiles=nil, frontmatter=nil, config=nil) # combine the input files temp_filename = "#{@tempdir}/x.md".to_osPath temp_frontmatter = "#{@tempdir}/xfrontmatter.md".to_osPath unless frontmatter.nil? collect_document(input, temp_filename) collect_document(frontmatter, temp_frontmatter) unless frontmatter.nil? # process the snippets if not snippetfiles.nil? snippets={} snippetfiles.each { |f| if File.exists?(f) type=File.extname(f) case type when ".yaml" x=YAML.load(File.new(f)) when ".xlsx" x=load_snippets_from_xlsx(f) else @log.error("Unsupported File format for snipptets: #{type}") x={} end snippets.merge!(x) else @log.error("Snippet file not found: #{f}") end } replace_snippets_in_file(temp_filename, snippets) end vars_frontmatter =vars.clone vars_frontmatter[:usetoc] = "nousetoc" if editions.nil? # there are no editions unless frontmatter.nil? then render_document(temp_frontmatter, tempdir, temp_frontmatter, ["frontmatter"], vars_frontmatter) vars[:frontmatter] = "#{tempdir}/#{temp_frontmatter}.latex" end render_document(temp_filename, outdir, outname, format, vars, config) else # process the editions editions.each { |edition_name, properties| edition_out_filename = "#{outname}_#{properties[:filepart]}" edition_temp_frontmatter = "#{@tempdir}/#{edition_out_filename}_frontmatter.md" unless frontmatter.nil? edition_temp_filename = "#{@tempdir}/#{edition_out_filename}.md" vars[:title] = properties[:title] editionformats = properties[:format] || format if properties[:debug] process_debug_info(temp_frontmatter, edition_temp_frontmatter, edition_name.to_s) unless frontmatter.nil? process_debug_info(temp_filename, edition_temp_filename, edition_name.to_s) lvars =vars.clone lvars[:linenumbers] = "true" unless frontmatter.nil? # frontmatter lvars[:usetoc] = "nousetoc" render_document(edition_temp_frontmatter, @tempdir, "xfrontmatter", ["frontmatter"], lvars) lvars[:usetoc] = vars[:usetoc] || "usetoc" lvars[:frontmatter] = "#{@tempdir}/xfrontmatter.latex" end render_document(edition_temp_filename, outdir, edition_out_filename, ["pdf", "latex"], lvars, config) else unless frontmatter.nil? # frontmatter filter_document_variant(temp_frontmatter, edition_temp_frontmatter, edition_name.to_s) render_document(edition_temp_frontmatter, @tempdir, "xfrontmatter", ["frontmatter"], vars_frontmatter) vars[:frontmatter]="#{@tempdir}/xfrontmatter.latex" end filter_document_variant(temp_filename, edition_temp_filename, edition_name.to_s) render_document(edition_temp_filename, outdir, edition_out_filename, editionformats, vars, config) end } end end
Ths determines the view filter
@param [String] line - the current input line @param [String] view - the currently selected view
@return true/false if a view-command is found, else nil
# File lib/wortsammler/class.proolib.rb, line 602 def get_filter_command(line, view) r = line.match(@view_pattern) if not r.nil? found = r[1].split(" ") result = (found & [view, "all"].flatten).any? else result = nil end result end
This loads snipptes from xlsx file @param [String] file name of the xlsx file @return [Hash] a hash with the snippetes
# File lib/wortsammler/class.proolib.rb, line 697 def load_snippets_from_xlsx(file) temp_filename = "#{@tempdir}/snippett.xlsx" FileUtils::copy(file, temp_filename) wb =RubyXL::Parser.parse(temp_filename) result={} wb.first.each { |row| key, the_value = row unless key.nil? unless the_value.nil? result[key.value.to_sym] = resolve_xml_entities(the_value.value) rescue "" end end } result end
This filters the document according to the target audience
@param [String] inputfile name of inputfile @param [String] outputfile name of outputfile @param [String] view - name of intended view
# File lib/wortsammler/class.proolib.rb, line 647 def process_debug_info(inputfile, outputfile, view) input_data = File.open(inputfile) { |f| f.readlines } output_data = Array.new input_data.each { |l| l.gsub!(@view_pattern) { |p| if $1.strip == "all" then color="black" else color="red" end "\\color{#{color}}\\rule{2cm}{0.5mm}\\newline\\marginpar{#{$1.strip}}" } l.gsub!(/todo:|TODO:/) { |p| "#{p}\\marginpar{TODO}" } output_data << l } File.open(outputfile, "w") { |f| f.puts output_data.join } end
@param config [ProoConfig] the entire config object (for future extensions) @return nil
# File lib/wortsammler/class.proolib.rb, line 850 def render_document(input, outdir, outname, format, vars, config=nil) #TODO: Clarify the following # on Windows, Tempdir contains a drive letter. But drive letter # seems not to work in pandoc -> pdf if the path separator ist forward # slash. There are two options to overcome this # # 1. set tempdir such that it does not contain a drive letter # 2. use Dir.mktempdir but ensure that all provided file names # use the platform specific SEPARATOR # # for whatever Reason, I decided for 2. tempfile = input tempfilePdf = "#{@tempdir}/x.TeX.md".to_osPath tempfileHtml = "#{@tempdir}/x.html.md".to_osPath outfile = "#{outdir}/#{outname}".to_osPath outfilePdf = "#{outfile}.pdf" outfileDocx = "#{outfile}.docx" outfileHtml = "#{outfile}.html" outfileRtf = "#{outfile}.rtf" outfileLatex = "#{outfile}.latex" outfileText = "#{outfile}.txt" outfileSlide = "#{outfile}.slide.html" ## format handle # todo: use this information ... format_config = { 'pdf' => { tempfile: :pdf, outfile: "#{outfile}.pdf" }, 'html' => { tempfile: :html, outfile: "#{outfile}.html" }, 'docx' => { tempfile: :html, outfile: "#{outfile}.docx" }, 'rtf' => { tempfile: :html, outfile: "#{outfile}.rtf" }, 'latex' => { tempfile: :pdf, outfile: "#{outfile}.latex" }, 'text' => { tempfile: :html, outfile: "#{outfile}.text" }, 'dzslides' => { tempfile: :html, outfile: "#{outfile}.slide.html" }, :beamer => { tempfile: :pdf, outfile: "#{outfile}.beamer.pdf" }, 'markdown' => { tempfile: :html, outfile: "#{outfile}.slide.html" } } tempfile_config = { pdf: "#{@tempdir}/x.TeX.md".to_osPath, html: "#{@tempdir}/x.html.md".to_osPath } if vars.has_key? :frontmatter latexTitleInclude = "--include-before-body=#{vars[:frontmatter].esc}" else latexTitleInclude end #todo: make config required, so it can be reduced to the else part if config.nil? then latexStyleFile = File.dirname(File.expand_path(__FILE__))+"/../../resources/default.wortsammler.latex" latexStyleFile = File.expand_path(latexStyleFile).to_osPath css_style_file = File.dirname(File.expand_path(__FILE__))+"/../../resources/default.wortsammler.css" css_style_file = File.expand_path(css_style_file).to_osPath else latexStyleFile = config.stylefiles[:latex] css_style_file = config.stylefiles[:css] end toc = "--toc" toc = "" if vars[:usetoc]=="nousetoc" if vars[:documentclass]=="book" option_chapters = "--chapters" else option_chapter = "" end begin vars_string=vars.map.map { |key, value| "-V #{key}=#{value.esc}" }.join(" ") rescue #todo require 'pry'; binding.pry end @log.info("rendering #{outname} as [#{format.join(', ')}]") supported_formats=["pdf", "latex", "frontmatter", "docx", "html", "txt", "rtf", "slidy", "md", "beamer"] wrong_format =format - supported_formats wrong_format.each { |f| @log.error("format not supported: #{f}") } begin if format.include?("frontmatter") then ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfilePdf.esc} --pdf-engine xelatex #{vars_string} --ascii -t latex+smart -o #{outfileLatex.esc}" `#{cmd}` end if (format.include?("pdf") | format.include?("latex")) then @log.debug("creating #{outfileLatex}") ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfilePdf.esc} #{toc} --standalone #{option_chapters} --pdf-engine xelatex --number-sections #{vars_string}" + " --template #{latexStyleFile.esc} --ascii -t latex+smart -o #{outfileLatex.esc} #{latexTitleInclude}" `#{cmd}` end if format.include?("pdf") then @log.debug("creating #{outfilePdf}") ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf) #cmd="#{PANDOC_EXE} -S #{tempfilePdf.esc} #{toc} --standalone #{option_chapters} --latex-engine xelatex --number-sections #{vars_string}" + # " --template #{latexStyleFile.esc} --ascii -o #{outfilePdf.esc} #{latexTitleInclude}" cmd ="#{LATEX_EXE} -halt-on-error -interaction nonstopmode -output-directory=#{outdir.esc} #{outfileLatex.esc}" #cmdmkindex = "makeindex \"#{outfile.esc}.idx\"" latex=LatexHelper.new.set_latex_command(cmd).setlogger(@log) latex.run(outfileLatex) messages=latex.log_analyze("#{outdir}/#{outname}.log") removeables = ["toc", "aux", "bak", "idx", "ilg", "ind"] removeables << "log" unless messages > 0 removeables << "latex" unless format.include?("latex") removeables = removeables.map { |e| "#{outdir}/#{outname}.#{e}" }.select { |f| File.exists?(f) } removeables.each { |e| @log.debug "removing file: #{e}" FileUtils.rm e } end if format.include?("html") then #todo: handle css @log.debug("creating #{outfileHtml}") ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections #{vars_string}" + " -t html+smart -o #{outfileHtml.esc}" `#{cmd}` end if format.include?("docx") then #todo: handle style file @log.debug("creating #{outfileDocx}") ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} #{toc} --standalone --self-contained --ascii --number-sections #{vars_string}" + " -f docx+smart -o #{outfileDocx.esc}" cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections #{vars_string}" + " -t docx+smart -o #{outfileDocx.esc}" `#{cmd}` end if format.include?("rtf") then @log.debug("creating #{outfileRtf}") ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections #{vars_string}" + " -t rtf+smart -o #{outfileRtf.esc}" `#{cmd}` end if format.include?("txt") then @log.debug("creating #{outfileText}") ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfileHtml) cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections #{vars_string}" + " -t plain+smart -o #{outfileText.esc}" `#{cmd}` end if format.include?("slidy") then @log.debug("creating #{outfileSlide}") ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml) #todo: handle stylefile cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained #{vars_string}" + " --ascii -t s5+smart --slide-level 1 -o #{outfileSlide.esc}" `#{cmd}` end if format.include?("beamer") then outfile = format_config[:beamer][:outfile] tempformat = format_config[:beamer][:tempfile] tempfile_out = tempfile_config[tempformat] @log.debug("creating #{outfile}") ReferenceTweaker.new(tempformat).prepareFile(tempfile, tempfile_out) cmd = %Q{#{PANDOC_EXE} -t beamer #{tempfile_out.esc} -V theme:Warsaw -o #{outfile.esc}} `#{cmd}` #messages=latex.log_analyze("#{outdir}/#{outname}.log") messages = 0 removeables = ["toc", "aux", "bak", "idx", "ilg", "ind"] removeables << "log" unless messages > 0 removeables << "latex" unless format.include?("latex") removeables = removeables.map { |e| "#{outdir}/#{outname}.#{e}" }.select { |f| File.exists?(f) } removeables.each { |e| @log.debug "removing file: #{e}" FileUtils.rm e } end rescue Exception => e @log.error "failed to perform #{cmd}, \n#{e.message}" @log.error e.backtrace.join("\n") #TODO make a try catch block kere end nil end
render a single file @param input [String] path to the inputfile @param outdir [String] path to the output directory @param format [Array of String] formats @return [nil] no useful return value
# File lib/wortsammler/class.proolib.rb, line 832 def render_single_document(input, outdir, format) outname=File.basename(input, ".*") render_document(input, outdir, outname, format, { :geometry => "a4paper" }) end
this replaces the text snippets in files
# File lib/wortsammler/class.proolib.rb, line 555 def replace_snippets_in_file(infile, snippets) input_data = File.open(infile) { |f| f.readlines.join } output_data=input_data.clone @log.debug("replacing snippets in #{infile}") replace_snippets_in_text(output_data, snippets) if (not input_data == output_data) File.open(infile, "w") { |f| f.puts output_data } end end
this replaces the snippets in a text
# File lib/wortsammler/class.proolib.rb, line 569 def replace_snippets_in_text(text, snippets) changed=false text.gsub!(SNIPPET_PATTERN) { |m| replacetext_raw=snippets[$2.to_sym] if replacetext_raw changed=true unless $1.nil? then leading_whitespace=$1.split("\n", 100) leading_lines =leading_whitespace[0..-1].join("\n") leading_spaces =leading_whitespace.last || "" replacetext =leading_lines+replacetext_raw.gsub("\n", "\n#{leading_spaces}") end @log.debug("replaced snippet #{$2} with #{replacetext}") else replacetext=m @log.warn("Snippet not found: #{$2}") end replacetext } #recursively process nested snippets #todo: this approach might rais undefined snippets twice if there are defined and undefined ones replace_snippets_in_text(text, snippets) if changed==true end
this resolves xml entities in Text (lt, gt, amp) @param [String] text with entities @return [String] text with replaced entities
# File lib/wortsammler/class.proolib.rb, line 717 def resolve_xml_entities(text) result=text result.gsub!("<", "<") result.gsub!(">", ">") result.gsub!("&", "&") result end