class Minilex::Lexer
Attributes
Public Class Methods
Creates a Lexer
instance
Expression = Minilex::Lexer.new do skip :whitespace, /\s+/ tok :number, /\d+(?:\.\d+)?/ tok :operator, /[\+\=\/\*]/ end
You don't have to pass a block. This also works:
Expression = Minilex::Lexer.new Expression.skip :whitespace, /\s+/ Expression.tok :number, /\d+(?:\.\d+)?/ Expression.tok :operator, /[\+\=\/\*]/
# File lib/minilex.rb, line 24 def initialize(&block) @rules = [] instance_eval &block if block end
Public Instance Methods
Makes the end-of-stream token
Similar to `append_token`, used to make the final token. Append [:eos] to the `tokens` array.
# File lib/minilex.rb, line 90 def append_eos tokens << [:eos] end
Makes a token
id - the id of the matched rule value - the value that was matched
Called when a rule is matched to build the resulting token.
Override this method if you'd like your tokens in a different form. You have access to the array of tokens via `tokens` and the current token's position information via `pos`.
returns an Array of [id, value, line, offset]
# File lib/minilex.rb, line 82 def append_token(id, value) tokens << [id, value, pos.line, pos.offset] end
Runs the lexer on the given input
returns an Array of tokens
# File lib/minilex.rb, line 52 def lex(input) @tokens = [] @pos = Pos.new(1, 0) @scanner = StringScanner.new(input) until scanner.eos? rule, text = match value = rule.processor ? send(rule.processor, text) : text append_token(rule.id, value) unless rule.skip update_pos(text) end append_eos tokens end
- internal
-
Finds the matching rule
Tries the rules in defined order until there's a match. Raise an UnrecognizedInput
error if ther isn't one.
returns a 2-element Array of [rule, matched_text]
# File lib/minilex.rb, line 101 def match rules.each do |rule| next unless text = scanner.scan(rule.pattern) return [rule, text] end raise UnrecognizedInput.new(scanner, pos) end
Defines patterns to ignore
id - an identifier, it's nice to name things pattern - the Regexp to skip
# File lib/minilex.rb, line 45 def skip(id, pattern) rules << Rule.new(id, pattern, nil, true) end
Defines a token-matching rule
id - this token's identifier pattern - a Regexp to match this token processor - a Sym that references a method on
this Lexer instance, which will be called to produce the `value` for this token (defaults to nil)
# File lib/minilex.rb, line 37 def tok(id, pattern, processor=nil) rules << Rule.new(id, pattern, processor) end
- internal
-
Updates the position information
text - the String that was matched by `match`
Inspects the matched text for newlines and updates the line number and offset accordingly
# File lib/minilex.rb, line 115 def update_pos(text) pos.line += newlines = text.count(?\n) if newlines > 0 pos.offset = text.rpartition(?\n)[2].length else pos.offset += text.length end end