class RLTK::CFG
The CFG
class is used to represent context-free grammars. It is used by the RLTK::Parser
class to represent the parser’s grammar, but can also be used to manipulate arbitrary CFGs.
Attributes
This is used by the {CFG#production} method to wrap {CFG#clause} calls.
@return [Symbol] The current left-hand side symbol.
@return [Symbol] The grammar’s starting symbol.
Public Class Methods
Tests to see if a symbol is a non-terminal symbol, as used by the CFG
class.
@param [Symbol] sym The symbol to test.
@return [Boolean]
# File lib/rltk/cfg.rb, line 62 def self.is_nonterminal?(sym) sym and (s = sym.to_s) == s.downcase end
Tests to see if a symbol is a terminal symbol, as used by the CFG
class.
@param [Symbol] sym The symbol to test.
@return [Boolean]
# File lib/rltk/cfg.rb, line 52 def self.is_terminal?(sym) sym and (s = sym.to_s) == s.upcase end
Instantiates a new CFG
object that uses callback to inform the programmer of the generation of new productions due to EBNF operators.
@param [Proc] callback A Proc object to be called when EBNF operators are expanded.
# File lib/rltk/cfg.rb, line 75 def initialize(&callback) @curr_lhs = nil @callback = callback || Proc.new {} @lexer = Lexers::EBNF.new @production_counter = -1 @start_symbol = nil @wrapper_symbol = nil @productions_id = Hash.new @productions_sym = Hash.new { |h, k| h[k] = [] } @production_buffer = Array.new @terms = Set.new([:EOS]) @nonterms = Set.new @firsts = Hash.new @follows = Hash.new { |h,k| h[k] = Array.new } end
Public Instance Methods
Adds production to the appropriate internal data structures.
@param [Production] production The production to add to the grammar.
@return [void]
# File lib/rltk/cfg.rb, line 99 def add_production(production) @productions_sym[production.lhs] << (@productions_id[production.id] = production) production end
Builds a production representing a (possibly empty) list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF * operator.
@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols
@return [void]
# File lib/rltk/cfg.rb, line 132 def build_list_production(name, list_elements, separator = '') # Add the items for the following productions: # # name: | name_prime name_prime = "#{name}_prime".to_sym # 1st Production production, _ = self.production(name, '') @callback.call(:elp, :empty, production) # 2nd Production production, _ = self.production(name, name_prime) @callback.call(:elp, :nonempty, production) # Add remaining productions via nonempty_list helper. self.nonempty_list(name_prime, list_elements, separator) name end
Builds a production representing a non-empty list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF + operator.
@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String, Symbol>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols
@return [void]
# File lib/rltk/cfg.rb, line 181 def build_nonempty_list_production(name, list_elements, separator = '') # Add the items for the following productions: # # If there is only one list element: # # name: list_element | name separator list_element # # else # # name: name_list_elements | name separator name_list_elements # # name_list_elements: #{list_elements.join('|')} build_elements_productions = false list_element_string = if list_elements.is_a?(Array) if list_elements.empty? raise ArgumentError, 'Parameter list_elements must not be empty.' elsif list_elements.length == 1 list_elements.first else build_elements_productions = true "#{name}_list_elements" end else list_elements end list_element_selected_string = list_element_string.to_s.split.map { |s| ".#{s}" }.join(' ') # Single Element Production production, _ = self.production(name, list_element_string) @callback.call(:nelp, :single, production) # Multiple Element Production production, selections = self.production(name, ".#{name} #{separator} #{list_element_selected_string}") @callback.call(:nelp, :multiple, production, selections) if build_elements_productions # List Element Productions list_elements.each do |element| production, _ = self.production(list_element_string, element) @callback.call(:nelp, :elements, production) end end name end
Build a production for an optional symbol. This is used to eliminate the EBNF ? operator.
@param [Symbol] name The name for the new production @param [Symbol] opt_symbol Symbol to expand
@return [Symbol] The value of the name argument
# File lib/rltk/cfg.rb, line 258 def build_optional_production(name, opt_symbol) if not @productions_sym.has_key?(name) # Add the items for the following productions: # # name: | opt_symbol # Empty production. production = self.add_production(Production.new(self.next_id, name, [])) @callback.call(:optional, :empty, production) # Nonempty production production = self.add_production(Production.new(self.next_id, name, [opt_symbol])) @callback.call(:optional, :nonempty, production) # Add the new symbol to the list of nonterminals. @nonterms << name end name end
Sets the EBNF callback to callback.
@param [Proc] callback A Proc object to be called when EBNF operators are expanded and list productions are added.
@return [void]
# File lib/rltk/cfg.rb, line 285 def callback(&callback) @callback = callback if callback nil end
This function MUST be called inside a CFG.production
block. It will make a new production with the left-hand side specified by the CFG.production
call’s argument. This is the function that is responsible for removing EBNF symbols from the grammar.
@param [String, Symbol] expression The right-hand side of a CFG
production.
@return [Array(Production
, Array<Integer>)]
# File lib/rltk/cfg.rb, line 299 def clause(expression) raise GrammarError, 'CFG#clause called outside of CFG#production block.' if not @curr_lhs lhs = @curr_lhs.to_sym rhs = Array.new tokens = @lexer.lex(expression.to_s) selections = Array.new # Set this as the start symbol if there isn't one already # defined. @start_symbol ||= lhs # Remove EBNF tokens and replace them with new productions. symbol_count = 0 tokens.each_index do |i| ttype0 = tokens[i].type tvalue0 = tokens[i].value if ttype0 == :TERM or ttype0 == :NONTERM # Add this symbol to the correct collection. (ttype0 == :TERM ? @terms : @nonterms) << tvalue0 if i + 1 < tokens.length ttype1 = tokens[i + 1].type tvalue1 = tokens[i + 1].value rhs << case ttype1 when :QUESTION then self.get_optional_production("#{tvalue0.downcase}_optional".to_sym, tvalue0) when :STAR then self.get_list_production("#{tvalue0.downcase}_list".to_sym, tvalue0) when :PLUS then self.get_nonempty_list_production("#{tvalue0.downcase}_nonempty_list".to_sym, tvalue0) else tvalue0 end else rhs << tvalue0 end symbol_count += 1 elsif ttype0 == :DOT selections << symbol_count end end # Make the production. @production_buffer << [(production = Production.new(self.next_id, lhs, rhs)), selections] # Make sure the production symbol is collected. @nonterms << lhs # Add the new production to our collections. self.add_production(production) return [production, selections] end
This function calculates the first set of a series of tokens. It uses the {CFG#first_set} helper function to find the first set of individual symbols.
@param [Symbol, Array<Symbol>] sentence Sentence to find the *first set* for.
@return [Array<Symbol>] The *first set* for the given sentence.
# File lib/rltk/cfg.rb, line 363 def first_set(sentence) if sentence.is_a?(Symbol) first_set_prime(sentence) elsif sentence.inject(true) { |m, sym| m and self.symbols.include?(sym) } set0 = [] all_have_empty = true sentence.each do |sym| set0 |= (set1 = self.first_set(sym)) - [:'ɛ'] break if not (all_have_empty = set1.include?(:'ɛ')) end if all_have_empty then set0 + [:'ɛ'] else set0 end end end
Returns the follow set for a given symbol. The second argument is used to avoid infinite recursion when mutually recursive rules are encountered.
@param [Symbol] sym0 The symbol to find the *follow set* for. @param [Array<Symbol>] seen_lh_sides Previously seen LHS symbols.
@return [Array<Symbol>]
# File lib/rltk/cfg.rb, line 443 def follow_set(sym0, seen_lh_sides = []) # Use the memoized set if possible. return @follows[sym0] if @follows.has_key?(sym0) if @nonterms.member? sym0 set0 = [] # Add EOS to the start symbol's follow set. set0 << :EOS if sym0 == @start_symbol @productions_id.values.each do |production| production.rhs.each_with_index do |sym1, i| if i + 1 < production.rhs.length if sym0 == sym1 set0 |= (set1 = self.first_set(production.rhs[(i + 1)..-1])) - [:'ɛ'] set0 |= self.follow_set(production.lhs) if set1.include?(:'ɛ') end elsif sym0 != production.lhs and sym0 == sym1 and not seen_lh_sides.include?(production.lhs) set0 |= self.follow_set(production.lhs, seen_lh_sides << production.lhs) end end end if seen_lh_sides.empty? or not set0.empty? # Memoize the result for later. @follows[sym0] |= set0 else set0 end else [] end end
If the production already exists it will be returned. If it does not exist then it will be created and then returned.
@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols
@return [void]
# File lib/rltk/cfg.rb, line 113 def get_list_production(name, list_elements, separator = '') if @nonterms.include?(name) name else build_list_production(name, list_elements, separator) end end
If the production already exists it will be returned. If it does not exist then it will be created and then returned.
@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols
@return [void]
# File lib/rltk/cfg.rb, line 162 def get_nonempty_list_production(name, list_elements, separator = '') if @nonterms.include?(name) name else build_nonempty_list_production(name, list_elements, separator) end end
If the production already exists it will be returned. If it does not exist then it will be created and then returned.
@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list
@return [void]
# File lib/rltk/cfg.rb, line 241 def get_optional_production(name, list_elements) if @nonterms.include?(name) name else build_optional_production(name, list_elements) end end
@return [Integer] ID for the next production to be defined.
# File lib/rltk/cfg.rb, line 480 def next_id @production_counter += 1 end
@return [Set<Symbol>] All terminal symbols used in the grammar’s definition.
# File lib/rltk/cfg.rb, line 485 def nonterms @nonterms.clone end
Builds a new production with the left-hand side value of symbol. If expression is specified it is take as the right-hand side of production. If expression is nil then block is evaluated, and expected to make one or more calls to {CFG#clause}.
@param [Symbol] symbol The left-hand side of a production @param [String, Symbol] expression The right-hand side of a production @param [Proc] block Optional block for defining production clauses
@return [Production, Array<Production>] A single production if called with an expression;
an array of productions otherwise
# File lib/rltk/cfg.rb, line 500 def production(symbol, expression = nil, &block) @production_buffer = Array.new prev_lhs = @curr_lhs @curr_lhs = symbol ret_val = if expression self.clause(expression) else self.instance_exec(&block) @production_buffer.clone end @curr_lhs = prev_lhs return ret_val end
If by is :sym, returns a hash of the grammar’s productions, using the productions’ left-hand side symbol as the key. If by is :id an array of productions is returned in the order of their definition.
@param [:sym, :id] by The way in which productions should be returned.
@return [Array<Production>, Hash{Symbol => Production}]
# File lib/rltk/cfg.rb, line 527 def productions(by = :sym) if by == :sym @productions_sym elsif by == :id @productions_id else nil end end
Sets the start symbol for this grammar.
@param [Symbol] symbol The new start symbol.
@return [Symbol]
# File lib/rltk/cfg.rb, line 542 def start(symbol) if not CFG::is_nonterminal?(symbol) raise GrammarError, 'Start symbol must be a non-terminal.' end @start_symbol = symbol end
@return [Array<Symbol>] All symbols used in the grammar’s definition.
# File lib/rltk/cfg.rb, line 551 def symbols self.terms + self.nonterms end
@return [Set<Symbol>] All terminal symbols used in the grammar’s definition.
# File lib/rltk/cfg.rb, line 556 def terms @terms.clone end
Private Instance Methods
This function is responsible for calculating the first set of individual symbols.
@param [Symbol] sym0 The symbol to find the *first set* of. @param [Array<Symbol>] seen_lh_sides Previously seen LHS symbols.
@return [Array<Symbol>]
# File lib/rltk/cfg.rb, line 388 def first_set_prime(sym0, seen_lh_sides = []) if self.symbols.include?(sym0) # Memoize the result for later. @firsts[sym0] ||= if CFG::is_terminal?(sym0) # If the symbol is a terminal, it is the only symbol in # its follow set. [sym0] else set0 = [] @productions_sym[sym0].each do |production| if production.rhs.empty? # If this is an empty production we should # add the empty string to the First set. set0 << :'ɛ' else all_have_empty = true production.rhs.each do |sym1| set1 = [] # Grab the First set for the current # symbol in this production. if not seen_lh_sides.include?(sym1) set0 |= (set1 = first_set_prime(sym1, seen_lh_sides << sym1)) - [:'ɛ'] end break if not (all_have_empty = set1.include?(:'ɛ')) end # Add the empty production if this production # is all non-terminals that can be reduced to # the empty string. set0 << :'ɛ' if all_have_empty end end set0.uniq end else [] end end