class ANTLR3::Lexer

Lexer

Lexer is the default superclass of all lexers generated by ANTLR. The class tailors the core functionality provided by Recognizer to the task of matching patterns in the text input and breaking the input into tokens.

About Lexers

A lexer’s job is to take input text and break it up into tokens – objects that encapsulate a piece of text, a type label (such as ID or INTEGER), and the position of the text with respect to the input. Thus, a lexer is essentially a complicated iterator that steps through an input stream and produces a sequence of tokens. Sometimes lexers are enough to carry out a goal on their own, such as tasks like source code highlighting and simple code analysis. Usually, however, the lexer converts text into tokens for use by a parser, which recognizes larger structures within the text.

ANTLR parsers have a variety of entry points specified by parser rules, each of which defines the structure of a specific type of sentence in a grammar. Lexers, however, are primarily intended to have a single entry point. It looks at the characters starting at the current input position, decides if the chunk of text matches one of a number of possible token type definitions, wraps the chunk into a token with information on its type and location, and advances the input stream to the next place.

ANTLR Lexers and the Lexer API

ANTLR-generated lexers will subclass this class, unless specified otherwise within a grammar file. The generated class will provide an implementation of each lexer rule as a method of the same name. The subclass will also provide an implementation for the abstract method m_tokens, the purpose of which is to multiplex the token type definitions and predict what rule definition to execute to fetch a token. The primary method in the lexer API, next_token, uses m_tokens to fetch the next token and drive the iteration.

If the lexer is preparing tokens for use by an ANTLR generated parser, the lexer will generally be used to build a TokenStream object. The following code example demonstrates the typical setup for using ANTLR parsers and lexers in Ruby.

# in HypotheticalLexer.rb
module Hypothetical
class Lexer < ANTLR3::Lexer
  # ...
  # ANTLR generated code
  # ...
end
end

# in HypotheticalParser.rb
module Hypothetical
class Parser < ANTLR3::Parser
  # ...
  # more ANTLR generated code
  # ...
end
end

# to take hypothetical source code and prepare it for parsing,
# there is generally a four-step construction process

source = "some hypothetical source code"
input = ANTLR3::StringStream.new(source, :file => 'blah-de-blah.hyp')
lexer = Hypothetical::Lexer.new( input )
tokens = ANTLR3::CommonTokenStream.new( lexer )
parser = Hypothetical::Parser.new( tokens )

# if you're using the standard streams, ANTLR3::StringStream and
# ANTLR3::CommonTokenStream, you can write the same process
# shown above more succinctly:

lexer  = Hypothetical::Lexer.new("some hypothetical source code", :file => 'blah-de-blah.hyp')
parser = Hypothetical::Parser.new( lexer )

Public Class Methods

associated_parser() click to toggle source
# File lib/antlr3/recognizers.rb, line 1001
def self.associated_parser
  @associated_parser ||= begin
    @grammar_home and @grammar_home::Parser
  rescue NameError
    grammar_name = @grammar_home.name.split( "::" ).last
    begin
      require "#{ grammar_name }Parser"
      @grammar_home::Parser
    rescue LoadError, NameError
    end
  end
end
default_rule() click to toggle source
# File lib/antlr3/recognizers.rb, line 991
def self.default_rule
  @default_rule ||= :token!
end
main( argv = ARGV, options = {} ) { |main| ... } click to toggle source
# File lib/antlr3/recognizers.rb, line 995
def self.main( argv = ARGV, options = {} )
  if argv.is_a?( ::Hash ) then argv, options = ARGV, argv end
  main = ANTLR3::Main::LexerMain.new( self, options )
  block_given? ? yield( main ) : main.execute( argv )
end
new( input, options = {} ) click to toggle source
Calls superclass method ANTLR3::Recognizer::new
# File lib/antlr3/recognizers.rb, line 1014
def initialize( input, options = {} )
  super( options )
  @input = cast_input( input, options )
end

Public Instance Methods

char_stream=( input ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1060
def char_stream=( input )
  @input = nil
  reset()
  @input = input
end
Also aliased as: input=
character_error_display( char ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1163
def character_error_display( char )
  case char
  when EOF then '<EOF>'
  when Integer then char.chr.inspect
  else char.inspect
  end
end
character_index() click to toggle source
# File lib/antlr3/recognizers.rb, line 1124
def character_index
  @input.index
end
column() click to toggle source
# File lib/antlr3/recognizers.rb, line 1120
def column
  @input.column
end
current_symbol() click to toggle source
# File lib/antlr3/recognizers.rb, line 1019
def current_symbol
  nil
end
emit( token = @state.token ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1070
def emit( token = @state.token )
  token ||= create_token
  @state.token = token
  return token
end
error_message( e ) click to toggle source
Calls superclass method ANTLR3::Recognizer#error_message
# File lib/antlr3/recognizers.rb, line 1141
def error_message( e )
  char = character_error_display( e.symbol ) rescue nil
  case e
  when Error::MismatchedToken
    expecting = character_error_display( e.expecting )
    "mismatched character #{ char }; expecting #{ expecting }"
  when Error::NoViableAlternative
    "no viable alternative at character #{ char }"
  when Error::EarlyExit
    "required ( ... )+ loop did not match anything at character #{ char }"
  when Error::MismatchedNotSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedRange
    a = character_error_display( e.min )
    b = character_error_display( e.max )
    "mismatched character %s; expecting set %s..%s" % [ char, a, b ]
  else super
  end
end
exhaust() click to toggle source
# File lib/antlr3/recognizers.rb, line 1056
def exhaust
  self.to_a
end
input=( input )
Alias for: char_stream=
line() click to toggle source
# File lib/antlr3/recognizers.rb, line 1116
def line
  @input.line
end
match( expected ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1076
def match( expected )
  case expected
  when String
    expected.each_byte do |char|
      unless @input.peek == char
        @state.backtracking > 0 and raise BacktrackingFailed
        error = MismatchedToken( char )
        recover( error )
        raise error
      end
      @input.consume()
    end
  else # single integer character
    unless @input.peek == expected
      @state.backtracking > 0 and raise BacktrackingFailed
      error = MismatchedToken( expected )
      recover( error )
      raise error
    end
    @input.consume
  end
  return true
end
match_any() click to toggle source
# File lib/antlr3/recognizers.rb, line 1100
def match_any
  @input.consume
end
match_range( min, max ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1104
def match_range( min, max )
  char = @input.peek
  if char.between?( min, max ) then @input.consume
  else
    @state.backtracking > 0 and raise BacktrackingFailed
    error = MismatchedRange( min.chr, max.chr )
    recover( error )
    raise( error )
  end
  return true
end
next_token() click to toggle source
# File lib/antlr3/recognizers.rb, line 1023
def next_token
  loop do
    @state.token = nil
    @state.channel = DEFAULT_CHANNEL
    @state.token_start_position = @input.index
    @state.token_start_column = @input.column
    @state.token_start_line = @input.line
    @state.text = nil
    @input.peek == EOF and return EOF_TOKEN
    begin
      token!
      
      case token = @state.token
      when nil then return( emit )
      when SKIP_TOKEN then next
      else
        return token
      end
    rescue NoViableAlternative => re
      report_error( re )
      recover( re )
    rescue Error::RecognitionError => re
      report_error( re )
    end
  end
end
recover( re ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1171
def recover( re )
  @input.consume
end
report_error( e ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1137
def report_error( e )
  display_recognition_error( e )
end
skip() click to toggle source
# File lib/antlr3/recognizers.rb, line 1050
def skip
  @state.token = SKIP_TOKEN
end
source_name() click to toggle source
# File lib/antlr3/recognizers.rb, line 1066
def source_name
  @input.source_name
end
text() click to toggle source
# File lib/antlr3/recognizers.rb, line 1128
def text
  @state.text and return @state.text
  @input.substring( @state.token_start_position, character_index - 1 )
end
text=( text ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1133
def text=( text )
  @state.text = text
end

Private Instance Methods

cast_input( input, options ) click to toggle source
# File lib/antlr3/recognizers.rb, line 1179
def cast_input( input, options )
  case input
  when CharacterStream then input
  when ::String then StringStream.new( input, options )
  when ::IO, ARGF then FileStream.new( input, options )
  else input
  end
end
create_token( &b ) click to toggle source
Calls superclass method ANTLR3::TokenFactory#create_token
# File lib/antlr3/recognizers.rb, line 1202
def create_token( &b )
  if block_given? then super( &b )
  else
    super do |t|
      t.input = @input
      t.type = @state.type
      t.channel = @state.channel
      t.start = @state.token_start_position
      t.stop = @input.index - 1
      t.line = @state.token_start_line
      t.text = self.text
      t.column = @state.token_start_column
    end
  end
end
trace_in( rule_name, rule_index ) click to toggle source
Calls superclass method ANTLR3::Recognizer#trace_in
# File lib/antlr3/recognizers.rb, line 1188
def trace_in( rule_name, rule_index )
  if symbol = @input.look and symbol != EOF then symbol = symbol.inspect
  else symbol = '<EOF>' end
  input_symbol = "#{ symbol } @ line #{ line } / col #{ column }"
  super( rule_name, rule_index, input_symbol )
end
trace_out( rule_name, rule_index ) click to toggle source
Calls superclass method ANTLR3::Recognizer#trace_out
# File lib/antlr3/recognizers.rb, line 1195
def trace_out( rule_name, rule_index )
  if symbol = @input.look and symbol != EOF then symbol = symbol.inspect
  else symbol = '<EOF>' end
  input_symbol = "#{ symbol } @ line #{ line } / col #{ column }"
  super( rule_name, rule_index, input_symbol )
end