class RLTK::Lexer

The Lexer class may be sub-classed to produce new lexers. These lexers have a lot of features, and are described in the main documentation.

Attributes

start_state[R]

@return [Symbol] State in which the lexer starts.

env[R]

@return [Environment] Environment used by an instantiated lexer.

Public Class Methods

inherited(klass) click to toggle source

Called when the Lexer class is sub-classed, it installes necessary instance class variables.

@return [void]

# File lib/rltk/lexer.rb, line 72
def inherited(klass)
        klass.install_icvars
end
install_icvars() click to toggle source

Installs instance class varialbes into a class.

@return [void]

# File lib/rltk/lexer.rb, line 79
def install_icvars
        @match_type = :longest
        @rules              = Hash.new {|h,k| h[k] = Array.new}
        @start_state        = :default
end
lex(string, file_name = nil, env = self::Environment.new(@start_state)) click to toggle source

Lex string, using env as the environment. This method will return the array of tokens generated by the lexer with a token of type EOS (End of Stream) appended to the end.

@param [String] string String to be lexed. @param [String] file_name File name used for recording token positions. @param [Environment] env Lexing environment.

@return [Array<Token>]

# File lib/rltk/lexer.rb, line 94
def lex(string, file_name = nil, env = self::Environment.new(@start_state))
        # Offset from start of stream.
        stream_offset = 0

        # Offset from the start of the line.
        line_offset = 0
        line_number = 1

        # Empty token list.
        tokens = Array.new

        # The scanner.
        scanner = StringScanner.new(string)

        # Start scanning the input string.
        until scanner.eos?
                match = nil

                # If the match_type is set to :longest all of the
                # rules for the current state need to be scanned
                # and the longest match returned.  If the
                # match_type is :first, we only need to scan until
                # we find a match.
                @rules[env.state].each do |rule|
                        if (rule.flags - env.flags).empty?
                                if txt = scanner.check(rule.pattern)
                                        if not match or match.first.length < txt.length
                                                match = [txt, rule]

                                                break if @match_type == :first
                                        end
                                end
                        end
                end

                if match
                        rule = match.last

                        txt = scanner.scan(rule.pattern)
                        type, value = env.rule_exec(rule.pattern.match(txt), txt, &rule.action)

                        if type
                                pos = StreamPosition.new(stream_offset, line_number, line_offset, txt.length, file_name)
                                tokens << Token.new(type, value, pos)
                        end

                        # Advance our stat counters.
                        stream_offset += txt.length

                        if (newlines = txt.count("\n")) > 0
                                line_number += newlines
                                line_offset = txt.rpartition("\n").last.length
                        else
                                line_offset += txt.length()
                        end
                else
                        error = LexingError.new(stream_offset, line_number, line_offset, scanner.rest)
                        raise(error, 'Unable to match string with any of the given rules')
                end
        end

        return tokens << Token.new(:EOS)
end
lex_file(file_name, env = self::Environment.new(@start_state)) click to toggle source

A wrapper function that calls {Lexer.lex} on the contents of a file.

@param [String] file_name File to be lexed. @param [Environment] env Lexing environment.

@return [Array<Token>]

# File lib/rltk/lexer.rb, line 165
def lex_file(file_name, env = self::Environment.new(@start_state))
        File.open(file_name, 'r') { |f| self.lex(f.read, file_name, env) }
end
match_first() click to toggle source

Used to tell a lexer to use the first match found instead of the longest match found.

@return [void]

# File lib/rltk/lexer.rb, line 173
def match_first
        @match_type = :first
end
new() click to toggle source

Instantiates a new lexer and creates an environment to be used for subsequent calls.

# File lib/rltk/lexer.rb, line 222
def initialize
        @env = self.class::Environment.new(self.class.start_state)
end
r(pattern, state = :default, flags = [], &action)
Alias for: rule
rule(pattern, state = :default, flags = [], &action) click to toggle source

This method is used to define a new lexing rule. The first argument is the regular expression used to match substrings of the input. The second argument is the state to which the rule belongs. Flags that need to be set for the rule to be considered are specified by the third argument. The last argument is a block that returns a type and value to be used in constructing a Token. If no block is specified the matched substring will be discarded and lexing will continue.

@param [Regexp, String] pattern Pattern for matching text. @param [Symbol] state State in which this rule is active. @param [Array<Symbol>] flags Flags which must be set for rule to be active. @param [Proc] action Proc object that produces Tokens.

@return [void]

# File lib/rltk/lexer.rb, line 193
def rule(pattern, state = :default, flags = [], &action)
        # If no action is given we will set it to an empty
        # action.
        action ||= Proc.new() {}

        pattern = Regexp.new(pattern) if pattern.is_a?(String)

        r = Rule.new(pattern, action, state, flags)

        if state == :ALL then @rules.each_key { |k| @rules[k] << r } else @rules[state] << r end
end
Also aliased as: r
start(state) click to toggle source

Changes the starting state of the lexer.

@param [Symbol] state Starting state for this lexer.

@return [void]

# File lib/rltk/lexer.rb, line 211
def start(state)
        @start_state = state
end

Public Instance Methods

lex(string, file_name = nil) click to toggle source

Lexes a string using the encapsulated environment.

@param [String] string String to be lexed. @param [String] file_name File name used for Token positions.

@return [Array<Token>]

# File lib/rltk/lexer.rb, line 232
def lex(string, file_name = nil)
        self.class.lex(string, file_name, @env)
end
lex_file(file_name) click to toggle source

Lexes a file using the encapsulated environment.

@param [String] file_name File to be lexed.

@return [Array<Token>]

# File lib/rltk/lexer.rb, line 241
def lex_file(file_name)
        self.class.lex_file(file_name, @env)
end