class ANTLR3::Lexer
Lexer
¶ ↑
Lexer
is the default superclass of all lexers generated by ANTLR. The class tailors the core functionality provided by Recognizer
to the task of matching patterns in the text input and breaking the input into tokens.
About Lexers¶ ↑
A lexer’s job is to take input text and break it up into tokens – objects that encapsulate a piece of text, a type label (such as ID or INTEGER), and the position of the text with respect to the input. Thus, a lexer is essentially a complicated iterator that steps through an input stream and produces a sequence of tokens. Sometimes lexers are enough to carry out a goal on their own, such as tasks like source code highlighting and simple code analysis. Usually, however, the lexer converts text into tokens for use by a parser, which recognizes larger structures within the text.
ANTLR parsers have a variety of entry points specified by parser rules, each of which defines the structure of a specific type of sentence in a grammar. Lexers, however, are primarily intended to have a single entry point. It looks at the characters starting at the current input position, decides if the chunk of text matches one of a number of possible token type definitions, wraps the chunk into a token with information on its type and location, and advances the input stream to the next place.
ANTLR Lexers and the Lexer
API¶ ↑
ANTLR-generated lexers will subclass this class, unless specified otherwise within a grammar file. The generated class will provide an implementation of each lexer rule as a method of the same name. The subclass will also provide an implementation for the abstract method m_tokens, the purpose of which is to multiplex the token type definitions and predict what rule definition to execute to fetch a token. The primary method in the lexer API, next_token
, uses m_tokens to fetch the next token and drive the iteration.
If the lexer is preparing tokens for use by an ANTLR generated parser, the lexer will generally be used to build a TokenStream
object. The following code example demonstrates the typical setup for using ANTLR parsers and lexers in Ruby.
# in HypotheticalLexer.rb module Hypothetical class Lexer < ANTLR3::Lexer # ... # ANTLR generated code # ... end end # in HypotheticalParser.rb module Hypothetical class Parser < ANTLR3::Parser # ... # more ANTLR generated code # ... end end # to take hypothetical source code and prepare it for parsing, # there is generally a four-step construction process source = "some hypothetical source code" input = ANTLR3::StringStream.new(source, :file => 'blah-de-blah.hyp') lexer = Hypothetical::Lexer.new( input ) tokens = ANTLR3::CommonTokenStream.new( lexer ) parser = Hypothetical::Parser.new( tokens ) # if you're using the standard streams, ANTLR3::StringStream and # ANTLR3::CommonTokenStream, you can write the same process # shown above more succinctly: lexer = Hypothetical::Lexer.new("some hypothetical source code", :file => 'blah-de-blah.hyp') parser = Hypothetical::Parser.new( lexer )
Public Class Methods
# File lib/antlr3/recognizers.rb, line 1001 def self.associated_parser @associated_parser ||= begin @grammar_home and @grammar_home::Parser rescue NameError grammar_name = @grammar_home.name.split( "::" ).last begin require "#{ grammar_name }Parser" @grammar_home::Parser rescue LoadError, NameError end end end
# File lib/antlr3/recognizers.rb, line 991 def self.default_rule @default_rule ||= :token! end
# File lib/antlr3/recognizers.rb, line 995 def self.main( argv = ARGV, options = {} ) if argv.is_a?( ::Hash ) then argv, options = ARGV, argv end main = ANTLR3::Main::LexerMain.new( self, options ) block_given? ? yield( main ) : main.execute( argv ) end
ANTLR3::Recognizer::new
# File lib/antlr3/recognizers.rb, line 1014 def initialize( input, options = {} ) super( options ) @input = cast_input( input, options ) end
Public Instance Methods
# File lib/antlr3/recognizers.rb, line 1060 def char_stream=( input ) @input = nil reset() @input = input end
# File lib/antlr3/recognizers.rb, line 1163 def character_error_display( char ) case char when EOF then '<EOF>' when Integer then char.chr.inspect else char.inspect end end
# File lib/antlr3/recognizers.rb, line 1124 def character_index @input.index end
# File lib/antlr3/recognizers.rb, line 1120 def column @input.column end
# File lib/antlr3/recognizers.rb, line 1019 def current_symbol nil end
# File lib/antlr3/recognizers.rb, line 1070 def emit( token = @state.token ) token ||= create_token @state.token = token return token end
ANTLR3::Recognizer#error_message
# File lib/antlr3/recognizers.rb, line 1141 def error_message( e ) char = character_error_display( e.symbol ) rescue nil case e when Error::MismatchedToken expecting = character_error_display( e.expecting ) "mismatched character #{ char }; expecting #{ expecting }" when Error::NoViableAlternative "no viable alternative at character #{ char }" when Error::EarlyExit "required ( ... )+ loop did not match anything at character #{ char }" when Error::MismatchedNotSet "mismatched character %s; expecting set %p" % [ char, e.expecting ] when Error::MismatchedSet "mismatched character %s; expecting set %p" % [ char, e.expecting ] when Error::MismatchedRange a = character_error_display( e.min ) b = character_error_display( e.max ) "mismatched character %s; expecting set %s..%s" % [ char, a, b ] else super end end
# File lib/antlr3/recognizers.rb, line 1056 def exhaust self.to_a end
# File lib/antlr3/recognizers.rb, line 1116 def line @input.line end
# File lib/antlr3/recognizers.rb, line 1076 def match( expected ) case expected when String expected.each_byte do |char| unless @input.peek == char @state.backtracking > 0 and raise BacktrackingFailed error = MismatchedToken( char ) recover( error ) raise error end @input.consume() end else # single integer character unless @input.peek == expected @state.backtracking > 0 and raise BacktrackingFailed error = MismatchedToken( expected ) recover( error ) raise error end @input.consume end return true end
# File lib/antlr3/recognizers.rb, line 1100 def match_any @input.consume end
# File lib/antlr3/recognizers.rb, line 1104 def match_range( min, max ) char = @input.peek if char.between?( min, max ) then @input.consume else @state.backtracking > 0 and raise BacktrackingFailed error = MismatchedRange( min.chr, max.chr ) recover( error ) raise( error ) end return true end
# File lib/antlr3/recognizers.rb, line 1023 def next_token loop do @state.token = nil @state.channel = DEFAULT_CHANNEL @state.token_start_position = @input.index @state.token_start_column = @input.column @state.token_start_line = @input.line @state.text = nil @input.peek == EOF and return EOF_TOKEN begin token! case token = @state.token when nil then return( emit ) when SKIP_TOKEN then next else return token end rescue NoViableAlternative => re report_error( re ) recover( re ) rescue Error::RecognitionError => re report_error( re ) end end end
# File lib/antlr3/recognizers.rb, line 1171 def recover( re ) @input.consume end
# File lib/antlr3/recognizers.rb, line 1137 def report_error( e ) display_recognition_error( e ) end
# File lib/antlr3/recognizers.rb, line 1050 def skip @state.token = SKIP_TOKEN end
# File lib/antlr3/recognizers.rb, line 1066 def source_name @input.source_name end
# File lib/antlr3/recognizers.rb, line 1128 def text @state.text and return @state.text @input.substring( @state.token_start_position, character_index - 1 ) end
# File lib/antlr3/recognizers.rb, line 1133 def text=( text ) @state.text = text end
Private Instance Methods
# File lib/antlr3/recognizers.rb, line 1179 def cast_input( input, options ) case input when CharacterStream then input when ::String then StringStream.new( input, options ) when ::IO, ARGF then FileStream.new( input, options ) else input end end
ANTLR3::TokenFactory#create_token
# File lib/antlr3/recognizers.rb, line 1202 def create_token( &b ) if block_given? then super( &b ) else super do |t| t.input = @input t.type = @state.type t.channel = @state.channel t.start = @state.token_start_position t.stop = @input.index - 1 t.line = @state.token_start_line t.text = self.text t.column = @state.token_start_column end end end
ANTLR3::Recognizer#trace_in
# File lib/antlr3/recognizers.rb, line 1188 def trace_in( rule_name, rule_index ) if symbol = @input.look and symbol != EOF then symbol = symbol.inspect else symbol = '<EOF>' end input_symbol = "#{ symbol } @ line #{ line } / col #{ column }" super( rule_name, rule_index, input_symbol ) end
ANTLR3::Recognizer#trace_out
# File lib/antlr3/recognizers.rb, line 1195 def trace_out( rule_name, rule_index ) if symbol = @input.look and symbol != EOF then symbol = symbol.inspect else symbol = '<EOF>' end input_symbol = "#{ symbol } @ line #{ line } / col #{ column }" super( rule_name, rule_index, input_symbol ) end