class RMail::Address::Parser

This class provides a facility to parse a string containing one or more RFC2822 addresses into an array of RMail::Address objects. You can use it directly, but it is more conveniently used with the RMail::Address.parse method.

Constants

SYM_ATOM
SYM_ATOM_NON_ASCII
SYM_AT_SIGN
SYM_COLON
SYM_COMMA
SYM_DOMAIN_LITERAL
SYM_GREATER_THAN
SYM_LESS_THAN
SYM_PERIOD
SYM_QTEXT
SYM_SEMI_COLON

Public Class Methods

new(string) click to toggle source

Create a RMail::Address::Parser object that will parse string. See also the RMail::Address.parse method.

# File lib/rmail/address.rb, line 279
def initialize(string)
  @string = string
end

Public Instance Methods

parse() click to toggle source

This function attempts to extract mailing addresses from the string passed to new. The function returns an RMail::Address::List of RMail::Address objects (RMail::Address::List is a subclass of Array). A malformed input string will not generate an exception. Instead, the array returned will simply not contained the malformed addresses.

The string is expected to be in a valid format as documented in RFC2822’s mailbox-list grammar. This will work for lists of addresses in the To:, From:, etc. headers in email.

# File lib/rmail/address.rb, line 295
def parse
  @lexemes = []
  @tokens = []
  @addresses = RMail::Address::List.new
  @errors = 0
  new_address
  get
  address_list
  reset_errors
  @addresses.delete_if { |a|
    !a.local || !a.domain
  }
end

Private Instance Methods

addr_spec() click to toggle source

Parse this:

addrSpec = localPart "@" domain
# File lib/rmail/address.rb, line 574
def addr_spec
  local_part
  expect(SYM_AT_SIGN)
  domain
end
address() click to toggle source

Parse this: address = mailbox | group

# File lib/rmail/address.rb, line 413
def address
  # At this point we could be looking at a display-name, angle
  # addr, or local-part.  If looking at a local-part, it could
  # actually be a display-name, according to the following:
  #
  # local-part '@' -> it is a local part of a local-part @ domain
  # local-part '<' -> it is a display-name of a mailbox
  # local-part ':' -> it is a display-name of a group
  # display-name '<' -> it is a mailbox display name
  # display-name ':' -> it is a group display name

  # set lookahead to '@' '<' or ':' (or another value for
  # invalid input)
  lookahead = address_lookahead

  if lookahead == SYM_COLON
    group
  else
    mailbox(lookahead)
  end
end
address_list() click to toggle source

Parse this: address_list = ([address] SYNC “,”) {[address] SYNC “,” } [address] .

# File lib/rmail/address.rb, line 361
def address_list
  if @sym == SYM_ATOM ||
      @sym == SYM_ATOM_NON_ASCII ||
      @sym == SYM_QTEXT ||
      @sym == SYM_LESS_THAN
    address
  end
  sync(SYM_COMMA)
  return if @sym.nil?
  expect(SYM_COMMA)
  new_address
  while @sym == SYM_ATOM ||
      @sym == SYM_ATOM_NON_ASCII ||
      @sym == SYM_QTEXT ||
      @sym == SYM_LESS_THAN ||
      @sym == SYM_COMMA
    if @sym == SYM_ATOM ||
        @sym == SYM_ATOM_NON_ASCII ||
        @sym == SYM_QTEXT ||
        @sym == SYM_LESS_THAN
      address
    end
    sync(SYM_COMMA)
    return if @sym.nil?
    expect(SYM_COMMA)
    new_address
  end
  if @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN
    address
  end
end
address_lookahead() click to toggle source

Parses ahead through a local-part or display-name until no longer looking at a word or “.” and returns the next symbol.

# File lib/rmail/address.rb, line 395
def address_lookahead
  lookahead = []
  while @sym == SYM_ATOM ||
      @sym == SYM_ATOM_NON_ASCII ||
      @sym == SYM_QTEXT ||
      @sym == SYM_PERIOD
    lookahead.push([@sym, @lexeme])
    get
  end
  retval = @sym
  putback(@sym, @lexeme)
  putback_array(lookahead)
  get
  retval
end
angle_addr() click to toggle source

Parse this:

angleAddr = SYNC "<" [obsRoute] addrSpec SYNC ">"
# File lib/rmail/address.rb, line 548
def angle_addr
  expect(SYM_LESS_THAN)
  if @sym == SYM_AT_SIGN
    obs_route
  end
  addr_spec
  expect(SYM_GREATER_THAN)
end
comment() click to toggle source
# File lib/rmail/address.rb, line 726
def comment
  depth = 0
  comment = ''
  catch(:done) {
    while @string =~ /\A(\(([^\(\)\\]|\\.)*)/m
      @string = $'
      comment += $1
      depth += 1
      while @string =~ /\A(([^\(\)\\]|\\.)*\))/m
        @string = $'
        comment += $1
        depth -= 1
        throw :done if depth == 0
        if @string =~ /\A(([^\(\)\\]|\\.)+)/
          @string = $'
          comment += $1
        end
      end
    end
  }
  comment = comment.gsub(/[\r\n\t ]+/m, ' ').
    sub(/\A\((.*)\)$/m, '\1').
    gsub(/\\(.)/, '\1')
  @addresses.last.comments =
    (@addresses.last.comments || []) + [comment]
end
display_name_word() click to toggle source

Parse this:

word = atom | atom_non_ascii | quotedString
# File lib/rmail/address.rb, line 516
def display_name_word
  if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT
    save_text
    get
  else
    error "expected word, got #{@sym.inspect}"
  end
end
domain() click to toggle source

Parse this:

domain = domainLiteral | obsDomain
# File lib/rmail/address.rb, line 559
def domain
  if @sym == SYM_DOMAIN_LITERAL
    save_text
    @addresses.last.domain = get_text
    get
  elsif @sym == SYM_ATOM
    obs_domain
    @addresses.last.domain = get_text
  else
    error "expected start of domain, got #{@sym.inspect}"
  end
end
error(s) click to toggle source
# File lib/rmail/address.rb, line 775
def error(s)
  @errors += 1
end
expect(token) click to toggle source
# File lib/rmail/address.rb, line 753
def expect(token)
  if @sym == token
    get
  else
    error("expected #{token.inspect} but got #{@sym.inspect}")
  end
end
expect_save(token) click to toggle source
# File lib/rmail/address.rb, line 761
def expect_save(token)
  if @sym == token
    save_text
  end
  expect(token)
end
get() click to toggle source

Get a single token from the string or from the @tokens array if somebody used putback.

# File lib/rmail/address.rb, line 639
def get
  unless @tokens.empty?
    @sym, @lexeme = @tokens.pop
  else
    get_tokenize
  end
end
get_text() click to toggle source

Get the text that has been saved up to this point.

# File lib/rmail/address.rb, line 336
def get_text
  text = ''
  sep = ''
  @lexemes.each { |lexeme|
    if lexeme == '.'
      text << lexeme
      sep = ''
    else
      text << sep
      text << lexeme
      sep = ' '
    end
  }
  @lexemes = []
  text
end
get_tokenize() click to toggle source

Get a single token from the string

# File lib/rmail/address.rb, line 648
def get_tokenize
  @lexeme = nil
  loop {
    case @string
    when nil             # the end
      @sym = nil
      break
    when ""               # the end
      @sym = nil
      break
    when /\A[\r\n\t ]+/m  # skip whitespace
      @string = $'
    when /\A\(/m          # skip comment
      comment
    when /\A""/           # skip empty quoted text
      @string = $'
    when /\A[\w!$%&\'*+\/=?^\`{\}|~#-]+/m
      @string = $'
      @sym = SYM_ATOM
      break
    when /\A"(.*?([^\\]|\\\\))"/m
      @string = $'
      @sym = SYM_QTEXT
      @lexeme = $1.gsub(/\\(.)/, '\1')
      break
    when /\A</
      @string = $'
      @sym = SYM_LESS_THAN
      break
    when /\A>/
      @string = $'
      @sym = SYM_GREATER_THAN
      break
    when /\A@/
      @string = $'
      @sym = SYM_AT_SIGN
      break
    when /\A,/
      @string = $'
      @sym = SYM_COMMA
      break
    when /\A:/
      @string = $'
      @sym = SYM_COLON
      break
    when /\A;/
      @string = $'
      @sym = SYM_SEMI_COLON
      break
    when /\A\./
      @string = $'
      @sym = SYM_PERIOD
      break
    when /\A(\[.*?([^\\]|\\\\)\])/m
      @string = $'
      @sym = SYM_DOMAIN_LITERAL
      @lexeme = $1.gsub(/(^|[^\\])[\r\n\t ]+/, '\1').gsub(/\\(.)/, '\1')
      break
    when /\A[\200-\377\w!$%&\'*+\/=?^\`{\}|~#-]+/nm
      # This is just like SYM_ATOM, but includes all characters
      # with high bits.  This is so we can allow such tokens in
      # the display name portion of an address even though it
      # violates the RFCs.
      @string = $'
      @sym = SYM_ATOM_NON_ASCII
      break
    when /\A./
      @string = $'        # garbage
      error('garbage character in string')
    else
      raise "internal error, @string is #{@string.inspect}"
    end
  }
  if @sym
    @lexeme ||= $&
  end
end
group() click to toggle source

Parse this:

group = word {word | "."} SYNC ":" [mailbox_list] SYNC ";"
# File lib/rmail/address.rb, line 492
def group
  word
  while @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_PERIOD
    if @sym == SYM_ATOM || @sym == SYM_QTEXT
      word
    else
      save_text
      get
    end
  end
  sync(SYM_COLON)
  expect(SYM_COLON)
  get_text               # throw away group name
  @addresses.last.comments = nil
  if @sym == SYM_ATOM || @sym == SYM_QTEXT ||
      @sym == SYM_COMMA || @sym == SYM_LESS_THAN
    mailbox_list
  end
  sync(SYM_SEMI_COLON)
  expect(SYM_SEMI_COLON)
end
local_part() click to toggle source

Parse this:

local_part = word *( "." word )
# File lib/rmail/address.rb, line 582
def local_part
  word
  while @sym == SYM_PERIOD
    save_text
    get
    word
  end
  @addresses.last.local = get_text
end
mailbox(lookahead) click to toggle source

Parse this:

mailbox = angleAddr |
          word {word | "."} angleAddr |
          word {"." word} "@" domain .

lookahead will be set to the return value of address_lookahead, which will be ‘@’ or ‘<’ (or another value for invalid input)

# File lib/rmail/address.rb, line 443
def mailbox(lookahead)
  if @sym == SYM_LESS_THAN
    angle_addr
  elsif lookahead == SYM_LESS_THAN
    display_name_word
    while @sym == SYM_ATOM ||
        @sym == SYM_ATOM_NON_ASCII ||
        @sym == SYM_QTEXT ||
        @sym == SYM_PERIOD
      if @sym == SYM_ATOM ||
          @sym == SYM_ATOM_NON_ASCII ||
          @sym == SYM_QTEXT
        display_name_word
      else
        save_text
        get
      end
    end
    @addresses.last.display_name = get_text
    angle_addr
  else
    word
    while @sym == SYM_PERIOD
      save_text
      get
      word
    end
    @addresses.last.local = get_text
    expect(SYM_AT_SIGN)
    domain

    if @sym == SYM_LESS_THAN
      # Workaround for invalid input.  Treat 'foo@bar <foo@bar>' as if it
      # were '"foo@bar" <foo@bar>'.  The domain parser will eat
      # 'bar' but stop at '<'.  At this point, we've been
      # parsing the display name as if it were an address, so we
      # throw the address into display_name and parse an
      # angle_addr.
      @addresses.last.display_name =
        format("%s@%s", @addresses.last.local, @addresses.last.domain)
      @addresses.last.local = nil
      @addresses.last.domain = nil
      angle_addr
    end
  end
end
mailbox_list() click to toggle source

Parse a mailbox list.

# File lib/rmail/address.rb, line 537
def mailbox_list
  mailbox(address_lookahead)
  while @sym == SYM_COMMA
    get
    new_address
    mailbox(address_lookahead)
  end
end
new_address() click to toggle source
# File lib/rmail/address.rb, line 330
def new_address
  reset_errors
  @addresses.push(Address.new)
end
obs_domain() click to toggle source

Parse this:

obs_domain =  atom  *( "."  atom ) .
# File lib/rmail/address.rb, line 594
def obs_domain
  expect_save(SYM_ATOM)
  while @sym == SYM_PERIOD
    save_text
    get
    expect_save(SYM_ATOM)
  end
end
obs_domain_list() click to toggle source

Parse this:

obs_domain_list = "@" domain *( *( "," ) "@" domain )
# File lib/rmail/address.rb, line 612
def obs_domain_list
  expect(SYM_AT_SIGN)
  domain
  while @sym == SYM_COMMA || @sym == SYM_AT_SIGN
    while @sym == SYM_COMMA
      get
    end
    expect(SYM_AT_SIGN)
    domain
  end
end
obs_route() click to toggle source

Parse this:

obs_route = obs_domain_list ":"
# File lib/rmail/address.rb, line 605
def obs_route
  obs_domain_list
  expect(SYM_COLON)
end
putback(sym, lexeme) click to toggle source

Put a token back into the input stream. This token will be retrieved by the next call to get.

# File lib/rmail/address.rb, line 626
def putback(sym, lexeme)
  @tokens.push([sym, lexeme])
end
putback_array(a) click to toggle source

Put back an array of tokens into the input stream.

# File lib/rmail/address.rb, line 631
def putback_array(a)
  a.reverse_each { |e|
    putback(*e)
  }
end
reset_errors() click to toggle source
# File lib/rmail/address.rb, line 323
def reset_errors
  if @errors > 0
    @addresses.pop
    @errors = 0
  end
end
save_text() click to toggle source

Save the current lexeme away for later retrieval with get_text.

# File lib/rmail/address.rb, line 355
def save_text
  @lexemes << @lexeme
end
sync(token) click to toggle source
# File lib/rmail/address.rb, line 768
def sync(token)
  while @sym && @sym != token
    error "expected #{token.inspect} but got #{@sym.inspect}"
    get
  end
end
word() click to toggle source

Parse this:

word = atom | quotedString
# File lib/rmail/address.rb, line 527
def word
  if @sym == SYM_ATOM || @sym == SYM_QTEXT
    save_text
    get
  else
    error "expected word, got #{@sym.inspect}"
  end
end