class RLTK::Lexer
The Lexer
class may be sub-classed to produce new lexers. These lexers have a lot of features, and are described in the main documentation.
Attributes
@return [Symbol] State in which the lexer starts.
@return [Environment] Environment
used by an instantiated lexer.
Public Class Methods
Called when the Lexer
class is sub-classed, it installes necessary instance class variables.
@return [void]
# File lib/rltk/lexer.rb, line 72 def inherited(klass) klass.install_icvars end
Installs instance class varialbes into a class.
@return [void]
# File lib/rltk/lexer.rb, line 79 def install_icvars @match_type = :longest @rules = Hash.new {|h,k| h[k] = Array.new} @start_state = :default end
Lex string, using env as the environment. This method will return the array of tokens generated by the lexer with a token of type EOS (End of Stream) appended to the end.
@param [String] string String to be lexed. @param [String] file_name File name used for recording token positions. @param [Environment] env Lexing environment.
@return [Array<Token>]
# File lib/rltk/lexer.rb, line 94 def lex(string, file_name = nil, env = self::Environment.new(@start_state)) # Offset from start of stream. stream_offset = 0 # Offset from the start of the line. line_offset = 0 line_number = 1 # Empty token list. tokens = Array.new # The scanner. scanner = StringScanner.new(string) # Start scanning the input string. until scanner.eos? match = nil # If the match_type is set to :longest all of the # rules for the current state need to be scanned # and the longest match returned. If the # match_type is :first, we only need to scan until # we find a match. @rules[env.state].each do |rule| if (rule.flags - env.flags).empty? if txt = scanner.check(rule.pattern) if not match or match.first.length < txt.length match = [txt, rule] break if @match_type == :first end end end end if match rule = match.last txt = scanner.scan(rule.pattern) type, value = env.rule_exec(rule.pattern.match(txt), txt, &rule.action) if type pos = StreamPosition.new(stream_offset, line_number, line_offset, txt.length, file_name) tokens << Token.new(type, value, pos) end # Advance our stat counters. stream_offset += txt.length if (newlines = txt.count("\n")) > 0 line_number += newlines line_offset = txt.rpartition("\n").last.length else line_offset += txt.length() end else error = LexingError.new(stream_offset, line_number, line_offset, scanner.rest) raise(error, 'Unable to match string with any of the given rules') end end return tokens << Token.new(:EOS) end
A wrapper function that calls {Lexer.lex} on the contents of a file.
@param [String] file_name File to be lexed. @param [Environment] env Lexing environment.
@return [Array<Token>]
# File lib/rltk/lexer.rb, line 165 def lex_file(file_name, env = self::Environment.new(@start_state)) File.open(file_name, 'r') { |f| self.lex(f.read, file_name, env) } end
Used to tell a lexer to use the first match found instead of the longest match found.
@return [void]
# File lib/rltk/lexer.rb, line 173 def match_first @match_type = :first end
Instantiates a new lexer and creates an environment to be used for subsequent calls.
# File lib/rltk/lexer.rb, line 222 def initialize @env = self.class::Environment.new(self.class.start_state) end
This method is used to define a new lexing rule. The first argument is the regular expression used to match substrings of the input. The second argument is the state to which the rule belongs. Flags that need to be set for the rule to be considered are specified by the third argument. The last argument is a block that returns a type and value to be used in constructing a Token
. If no block is specified the matched substring will be discarded and lexing will continue.
@param [Regexp, String] pattern Pattern for matching text. @param [Symbol] state State in which this rule is active. @param [Array<Symbol>] flags Flags which must be set for rule to be active. @param [Proc] action Proc object that produces Tokens.
@return [void]
# File lib/rltk/lexer.rb, line 193 def rule(pattern, state = :default, flags = [], &action) # If no action is given we will set it to an empty # action. action ||= Proc.new() {} pattern = Regexp.new(pattern) if pattern.is_a?(String) r = Rule.new(pattern, action, state, flags) if state == :ALL then @rules.each_key { |k| @rules[k] << r } else @rules[state] << r end end
Changes the starting state of the lexer.
@param [Symbol] state Starting state for this lexer.
@return [void]
# File lib/rltk/lexer.rb, line 211 def start(state) @start_state = state end
Public Instance Methods
Lexes a string using the encapsulated environment.
@param [String] string String to be lexed. @param [String] file_name File name used for Token
positions.
@return [Array<Token>]
# File lib/rltk/lexer.rb, line 232 def lex(string, file_name = nil) self.class.lex(string, file_name, @env) end
Lexes a file using the encapsulated environment.
@param [String] file_name File to be lexed.
@return [Array<Token>]
# File lib/rltk/lexer.rb, line 241 def lex_file(file_name) self.class.lex_file(file_name, @env) end