class RLTK::CFG

The CFG class is used to represent context-free grammars. It is used by the RLTK::Parser class to represent the parser’s grammar, but can also be used to manipulate arbitrary CFGs.

Attributes

curr_lhs[RW]

This is used by the {CFG#production} method to wrap {CFG#clause} calls.

@return [Symbol] The current left-hand side symbol.

start_symbol[R]

@return [Symbol] The grammar’s starting symbol.

Public Class Methods

is_nonterminal?(sym) click to toggle source

Tests to see if a symbol is a non-terminal symbol, as used by the CFG class.

@param [Symbol] sym The symbol to test.

@return [Boolean]

# File lib/rltk/cfg.rb, line 62
def self.is_nonterminal?(sym)
        sym and (s = sym.to_s) == s.downcase
end
is_terminal?(sym) click to toggle source

Tests to see if a symbol is a terminal symbol, as used by the CFG class.

@param [Symbol] sym The symbol to test.

@return [Boolean]

# File lib/rltk/cfg.rb, line 52
def self.is_terminal?(sym)
        sym and (s = sym.to_s) == s.upcase
end
new(&callback) click to toggle source

Instantiates a new CFG object that uses callback to inform the programmer of the generation of new productions due to EBNF operators.

@param [Proc] callback A Proc object to be called when EBNF operators are expanded.

# File lib/rltk/cfg.rb, line 75
def initialize(&callback)
        @curr_lhs           = nil
        @callback           = callback || Proc.new {}
        @lexer              = Lexers::EBNF.new
        @production_counter = -1
        @start_symbol       = nil
        @wrapper_symbol     = nil

        @productions_id     = Hash.new
        @productions_sym    = Hash.new { |h, k| h[k] = [] }
        @production_buffer  = Array.new

        @terms    = Set.new([:EOS])
        @nonterms = Set.new

        @firsts   = Hash.new
        @follows  = Hash.new { |h,k| h[k] = Array.new }
end

Public Instance Methods

add_production(production) click to toggle source

Adds production to the appropriate internal data structures.

@param [Production] production The production to add to the grammar.

@return [void]

# File lib/rltk/cfg.rb, line 99
def add_production(production)
        @productions_sym[production.lhs] << (@productions_id[production.id] = production)

        production
end
build_list_production(name, list_elements, separator = '') click to toggle source

Builds a production representing a (possibly empty) list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF * operator.

@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols

@return [void]

# File lib/rltk/cfg.rb, line 132
def build_list_production(name, list_elements, separator = '')
        # Add the items for the following productions:
        #
        # name: | name_prime

        name_prime = "#{name}_prime".to_sym

        # 1st Production
        production, _ = self.production(name, '')
        @callback.call(:elp, :empty, production)

        # 2nd Production
        production, _ = self.production(name, name_prime)
        @callback.call(:elp, :nonempty, production)

        # Add remaining productions via nonempty_list helper.
        self.nonempty_list(name_prime, list_elements, separator)

        name
end
Also aliased as: list
build_nonempty_list_production(name, list_elements, separator = '') click to toggle source

Builds a production representing a non-empty list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF + operator.

@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String, Symbol>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols

@return [void]

# File lib/rltk/cfg.rb, line 181
def build_nonempty_list_production(name, list_elements, separator = '')
        # Add the items for the following productions:
        #
        # If there is only one list element:
        #
        #   name: list_element | name separator list_element
        #
        # else
        #
        #   name: name_list_elements | name separator name_list_elements
        #
        #   name_list_elements: #{list_elements.join('|')}

        build_elements_productions = false

        list_element_string =
        if list_elements.is_a?(Array)
                if list_elements.empty?
                        raise ArgumentError, 'Parameter list_elements must not be empty.'

                elsif list_elements.length == 1
                        list_elements.first

                else
                        build_elements_productions = true
                        "#{name}_list_elements"
                end
        else
                list_elements
        end

        list_element_selected_string = list_element_string.to_s.split.map { |s| ".#{s}" }.join(' ')

        # Single Element Production
        production, _ = self.production(name, list_element_string)
        @callback.call(:nelp, :single, production)

        # Multiple Element Production
        production, selections = self.production(name, ".#{name} #{separator} #{list_element_selected_string}")
        @callback.call(:nelp, :multiple, production, selections)

        if build_elements_productions
                # List Element Productions
                list_elements.each do |element|
                        production, _ = self.production(list_element_string, element)
                        @callback.call(:nelp, :elements, production)
                end
        end

        name
end
Also aliased as: nonempty_list
build_optional_production(name, opt_symbol) click to toggle source

Build a production for an optional symbol. This is used to eliminate the EBNF ? operator.

@param [Symbol] name The name for the new production @param [Symbol] opt_symbol Symbol to expand

@return [Symbol] The value of the name argument

# File lib/rltk/cfg.rb, line 258
def build_optional_production(name, opt_symbol)
        if not @productions_sym.has_key?(name)
                # Add the items for the following productions:
                #
                # name: | opt_symbol

                # Empty production.
                production = self.add_production(Production.new(self.next_id, name, []))
                @callback.call(:optional, :empty, production)

                # Nonempty production
                production = self.add_production(Production.new(self.next_id, name, [opt_symbol]))
                @callback.call(:optional, :nonempty, production)

                # Add the new symbol to the list of nonterminals.
                @nonterms << name
        end

        name
end
Also aliased as: optional
callback(&callback) click to toggle source

Sets the EBNF callback to callback.

@param [Proc] callback A Proc object to be called when EBNF operators are expanded and list productions are added.

@return [void]

# File lib/rltk/cfg.rb, line 285
def callback(&callback)
        @callback = callback if callback

        nil
end
clause(expression) click to toggle source

This function MUST be called inside a CFG.production block. It will make a new production with the left-hand side specified by the CFG.production call’s argument. This is the function that is responsible for removing EBNF symbols from the grammar.

@param [String, Symbol] expression The right-hand side of a CFG production.

@return [Array(Production, Array<Integer>)]

# File lib/rltk/cfg.rb, line 299
def clause(expression)
        raise GrammarError, 'CFG#clause called outside of CFG#production block.' if not @curr_lhs

        lhs        = @curr_lhs.to_sym
        rhs        = Array.new
        tokens     = @lexer.lex(expression.to_s)
        selections = Array.new

        # Set this as the start symbol if there isn't one already
        # defined.
        @start_symbol ||= lhs

        # Remove EBNF tokens and replace them with new productions.
        symbol_count = 0
        tokens.each_index do |i|
                ttype0  = tokens[i].type
                tvalue0 = tokens[i].value

                if ttype0 == :TERM or ttype0 == :NONTERM

                        # Add this symbol to the correct collection.
                        (ttype0 == :TERM ? @terms : @nonterms) << tvalue0

                        if i + 1 < tokens.length
                                ttype1  = tokens[i + 1].type
                                tvalue1 = tokens[i + 1].value

                                rhs <<
                                case ttype1
                                when :QUESTION then self.get_optional_production("#{tvalue0.downcase}_optional".to_sym, tvalue0)
                                when :STAR     then self.get_list_production("#{tvalue0.downcase}_list".to_sym, tvalue0)
                                when :PLUS     then self.get_nonempty_list_production("#{tvalue0.downcase}_nonempty_list".to_sym, tvalue0)
                                else                tvalue0
                                end
                        else
                                rhs << tvalue0
                        end

                        symbol_count += 1

                elsif ttype0 == :DOT
                        selections << symbol_count
                end
        end

        # Make the production.
        @production_buffer << [(production = Production.new(self.next_id, lhs, rhs)), selections]

        # Make sure the production symbol is collected.
        @nonterms << lhs

        # Add the new production to our collections.
        self.add_production(production)

        return [production, selections]
end
first_set(sentence) click to toggle source

This function calculates the first set of a series of tokens. It uses the {CFG#first_set} helper function to find the first set of individual symbols.

@param [Symbol, Array<Symbol>] sentence Sentence to find the *first set* for.

@return [Array<Symbol>] The *first set* for the given sentence.

# File lib/rltk/cfg.rb, line 363
def first_set(sentence)
        if sentence.is_a?(Symbol)
                first_set_prime(sentence)

        elsif sentence.inject(true) { |m, sym| m and self.symbols.include?(sym) }
                set0 = []
                all_have_empty = true

                sentence.each do |sym|
                        set0 |= (set1 = self.first_set(sym)) - [:'ɛ']

                        break if not (all_have_empty = set1.include?(:'ɛ'))
                end

                if all_have_empty then set0 + [:'ɛ'] else set0 end
        end
end
follow_set(sym0, seen_lh_sides = []) click to toggle source

Returns the follow set for a given symbol. The second argument is used to avoid infinite recursion when mutually recursive rules are encountered.

@param [Symbol] sym0 The symbol to find the *follow set* for. @param [Array<Symbol>] seen_lh_sides Previously seen LHS symbols.

@return [Array<Symbol>]

# File lib/rltk/cfg.rb, line 443
def follow_set(sym0, seen_lh_sides = [])

        # Use the memoized set if possible.
        return @follows[sym0] if @follows.has_key?(sym0)

        if @nonterms.member? sym0
                set0 = []

                # Add EOS to the start symbol's follow set.
                set0 << :EOS if sym0 == @start_symbol

                @productions_id.values.each do |production|
                        production.rhs.each_with_index do |sym1, i|
                                if i + 1 < production.rhs.length
                                        if sym0 == sym1
                                                set0 |= (set1 = self.first_set(production.rhs[(i + 1)..-1])) - [:'ɛ']

                                                set0 |= self.follow_set(production.lhs) if set1.include?(:'ɛ')
                                        end
                                elsif sym0 != production.lhs and sym0 == sym1 and not seen_lh_sides.include?(production.lhs)
                                        set0 |= self.follow_set(production.lhs, seen_lh_sides << production.lhs)
                                end
                        end
                end

                if seen_lh_sides.empty? or not set0.empty?
                        # Memoize the result for later.
                        @follows[sym0] |= set0
                else
                        set0
                end
        else
                []
        end
end
get_list(name, list_elements, separator = '')
Alias for: get_list_production
get_list_production(name, list_elements, separator = '') click to toggle source

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols

@return [void]

# File lib/rltk/cfg.rb, line 113
def get_list_production(name, list_elements, separator = '')
        if @nonterms.include?(name)
                name

        else
                build_list_production(name, list_elements, separator)
        end
end
Also aliased as: get_list
get_nonempty_list(name, list_elements, separator = '')
get_nonempty_list_production(name, list_elements, separator = '') click to toggle source

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list @param [Symbol, String] separator The list separator symbol or symbols

@return [void]

# File lib/rltk/cfg.rb, line 162
def get_nonempty_list_production(name, list_elements, separator = '')
        if @nonterms.include?(name)
                name

        else
                build_nonempty_list_production(name, list_elements, separator)
        end
end
Also aliased as: get_nonempty_list
get_optional(name, list_elements)
get_optional_production(name, list_elements) click to toggle source

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

@param [Symbol] name The name of the production to add @param [String, Symbol, Array<String>] list_elements Expression(s) that may appear in the list

@return [void]

# File lib/rltk/cfg.rb, line 241
def get_optional_production(name, list_elements)
        if @nonterms.include?(name)
                name

        else
                build_optional_production(name, list_elements)
        end
end
Also aliased as: get_optional
list(name, list_elements, separator = '')
next_id() click to toggle source

@return [Integer] ID for the next production to be defined.

# File lib/rltk/cfg.rb, line 480
def next_id
        @production_counter += 1
end
nonempty_list(name, list_elements, separator = '')
nonterms() click to toggle source

@return [Set<Symbol>] All terminal symbols used in the grammar’s definition.

# File lib/rltk/cfg.rb, line 485
def nonterms
        @nonterms.clone
end
optional(name, opt_symbol)
production(symbol, expression = nil, &block) click to toggle source

Builds a new production with the left-hand side value of symbol. If expression is specified it is take as the right-hand side of production. If expression is nil then block is evaluated, and expected to make one or more calls to {CFG#clause}.

@param [Symbol] symbol The left-hand side of a production @param [String, Symbol] expression The right-hand side of a production @param [Proc] block Optional block for defining production clauses

@return [Production, Array<Production>] A single production if called with an expression;

an array of productions otherwise
# File lib/rltk/cfg.rb, line 500
def production(symbol, expression = nil, &block)
        @production_buffer = Array.new

        prev_lhs  = @curr_lhs
        @curr_lhs = symbol

        ret_val =
        if expression
                self.clause(expression)
        else
                self.instance_exec(&block)

                @production_buffer.clone
        end

        @curr_lhs = prev_lhs
        return ret_val
end
productions(by = :sym) click to toggle source

If by is :sym, returns a hash of the grammar’s productions, using the productions’ left-hand side symbol as the key. If by is :id an array of productions is returned in the order of their definition.

@param [:sym, :id] by The way in which productions should be returned.

@return [Array<Production>, Hash{Symbol => Production}]

# File lib/rltk/cfg.rb, line 527
def productions(by = :sym)
        if by == :sym
                @productions_sym
        elsif by == :id
                @productions_id
        else
                nil
        end
end
start(symbol) click to toggle source

Sets the start symbol for this grammar.

@param [Symbol] symbol The new start symbol.

@return [Symbol]

# File lib/rltk/cfg.rb, line 542
def start(symbol)
        if not CFG::is_nonterminal?(symbol)
                raise GrammarError, 'Start symbol must be a non-terminal.'
        end

        @start_symbol = symbol
end
symbols() click to toggle source

@return [Array<Symbol>] All symbols used in the grammar’s definition.

# File lib/rltk/cfg.rb, line 551
def symbols
        self.terms + self.nonterms
end
terms() click to toggle source

@return [Set<Symbol>] All terminal symbols used in the grammar’s definition.

# File lib/rltk/cfg.rb, line 556
def terms
        @terms.clone
end

Private Instance Methods

first_set_prime(sym0, seen_lh_sides = []) click to toggle source

This function is responsible for calculating the first set of individual symbols.

@param [Symbol] sym0 The symbol to find the *first set* of. @param [Array<Symbol>] seen_lh_sides Previously seen LHS symbols.

@return [Array<Symbol>]

# File lib/rltk/cfg.rb, line 388
def first_set_prime(sym0, seen_lh_sides = [])
        if self.symbols.include?(sym0)
                # Memoize the result for later.
                @firsts[sym0] ||=
                if CFG::is_terminal?(sym0)
                        # If the symbol is a terminal, it is the only symbol in
                        # its follow set.
                        [sym0]
                else
                        set0 = []

                        @productions_sym[sym0].each do |production|
                                if production.rhs.empty?
                                        # If this is an empty production we should
                                        # add the empty string to the First set.
                                        set0 << :'ɛ'
                                else
                                        all_have_empty = true

                                        production.rhs.each do |sym1|

                                                set1 = []

                                                # Grab the First set for the current
                                                # symbol in this production.
                                                if not seen_lh_sides.include?(sym1)
                                                        set0 |= (set1 = first_set_prime(sym1, seen_lh_sides << sym1)) - [:'ɛ']
                                                end

                                                break if not (all_have_empty = set1.include?(:'ɛ'))
                                        end

                                        # Add the empty production if this production
                                        # is all non-terminals that can be reduced to
                                        # the empty string.
                                        set0 << :'ɛ' if all_have_empty
                                end
                        end

                        set0.uniq
                end
        else
                []
        end
end