class Minilex::Lexer

Attributes

pos[R]
rules[R]
scanner[R]
tokens[R]

Public Class Methods

new(&block) click to toggle source

Creates a Lexer instance

Expression = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :number, /\d+(?:\.\d+)?/
  tok :operator, /[\+\=\/\*]/
end

You don't have to pass a block. This also works:

Expression = Minilex::Lexer.new
Expression.skip :whitespace, /\s+/
Expression.tok :number, /\d+(?:\.\d+)?/
Expression.tok :operator, /[\+\=\/\*]/
# File lib/minilex.rb, line 24
def initialize(&block)
  @rules = []
  instance_eval &block if block
end

Public Instance Methods

append_eos() click to toggle source

Makes the end-of-stream token

Similar to `append_token`, used to make the final token. Append [:eos] to the `tokens` array.

# File lib/minilex.rb, line 90
def append_eos
  tokens << [:eos]
end
append_token(id, value) click to toggle source

Makes a token

id - the id of the matched rule value - the value that was matched

Called when a rule is matched to build the resulting token.

Override this method if you'd like your tokens in a different form. You have access to the array of tokens via `tokens` and the current token's position information via `pos`.

returns an Array of [id, value, line, offset]

# File lib/minilex.rb, line 82
def append_token(id, value)
  tokens << [id, value, pos.line, pos.offset]
end
lex(input) click to toggle source

Runs the lexer on the given input

returns an Array of tokens

# File lib/minilex.rb, line 52
def lex(input)
  @tokens = []
  @pos = Pos.new(1, 0)
  @scanner = StringScanner.new(input)

  until scanner.eos?
    rule, text = match
    value = rule.processor ? send(rule.processor, text) : text
    append_token(rule.id, value) unless rule.skip
    update_pos(text)
  end

  append_eos
  tokens
end
match() click to toggle source
internal

Finds the matching rule

Tries the rules in defined order until there's a match. Raise an UnrecognizedInput error if ther isn't one.

returns a 2-element Array of [rule, matched_text]

# File lib/minilex.rb, line 101
def match
  rules.each do |rule|
    next unless text = scanner.scan(rule.pattern)
    return [rule, text]
  end
  raise UnrecognizedInput.new(scanner, pos)
end
skip(id, pattern) click to toggle source

Defines patterns to ignore

id - an identifier, it's nice to name things pattern - the Regexp to skip

# File lib/minilex.rb, line 45
def skip(id, pattern)
  rules << Rule.new(id, pattern, nil, true)
end
tok(id, pattern, processor=nil) click to toggle source

Defines a token-matching rule

id - this token's identifier pattern - a Regexp to match this token processor - a Sym that references a method on

this Lexer instance, which will
be called to produce the `value`
for this token (defaults to nil)
# File lib/minilex.rb, line 37
def tok(id, pattern, processor=nil)
  rules << Rule.new(id, pattern, processor)
end
update_pos(text) click to toggle source
internal

Updates the position information

text - the String that was matched by `match`

Inspects the matched text for newlines and updates the line number and offset accordingly

# File lib/minilex.rb, line 115
def update_pos(text)
  pos.line += newlines = text.count(?\n)
  if newlines > 0
    pos.offset = text.rpartition(?\n)[2].length
  else
    pos.offset += text.length
  end
end