class RMail::Address::Parser
This class provides a facility to parse a string containing one or more RFC2822 addresses into an array of RMail::Address
objects. You can use it directly, but it is more conveniently used with the RMail::Address.parse
method.
Constants
- SYM_ATOM
- SYM_ATOM_NON_ASCII
- SYM_AT_SIGN
- SYM_COLON
- SYM_COMMA
- SYM_DOMAIN_LITERAL
- SYM_GREATER_THAN
- SYM_LESS_THAN
- SYM_PERIOD
- SYM_QTEXT
- SYM_SEMI_COLON
Public Class Methods
Create a RMail::Address::Parser
object that will parse string
. See also the RMail::Address.parse
method.
# File lib/rmail/address.rb, line 279 def initialize(string) @string = string end
Public Instance Methods
This function attempts to extract mailing addresses from the string passed to new. The function returns an RMail::Address::List
of RMail::Address
objects (RMail::Address::List
is a subclass of Array). A malformed input string will not generate an exception. Instead, the array returned will simply not contained the malformed addresses.
The string is expected to be in a valid format as documented in RFC2822’s mailbox-list grammar. This will work for lists of addresses in the To:
, From:
, etc. headers in email.
# File lib/rmail/address.rb, line 295 def parse @lexemes = [] @tokens = [] @addresses = RMail::Address::List.new @errors = 0 new_address get address_list reset_errors @addresses.delete_if { |a| !a.local || !a.domain } end
Private Instance Methods
Parse this:
addrSpec = localPart "@" domain
# File lib/rmail/address.rb, line 574 def addr_spec local_part expect(SYM_AT_SIGN) domain end
Parse this: address = mailbox | group
# File lib/rmail/address.rb, line 413 def address # At this point we could be looking at a display-name, angle # addr, or local-part. If looking at a local-part, it could # actually be a display-name, according to the following: # # local-part '@' -> it is a local part of a local-part @ domain # local-part '<' -> it is a display-name of a mailbox # local-part ':' -> it is a display-name of a group # display-name '<' -> it is a mailbox display name # display-name ':' -> it is a group display name # set lookahead to '@' '<' or ':' (or another value for # invalid input) lookahead = address_lookahead if lookahead == SYM_COLON group else mailbox(lookahead) end end
Parse this: address_list
= ([address] SYNC “,”) {[address] SYNC “,” } [address] .
# File lib/rmail/address.rb, line 361 def address_list if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end sync(SYM_COMMA) return if @sym.nil? expect(SYM_COMMA) new_address while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN || @sym == SYM_COMMA if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end sync(SYM_COMMA) return if @sym.nil? expect(SYM_COMMA) new_address end if @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end end
Parses ahead through a local-part or display-name until no longer looking at a word or “.” and returns the next symbol.
# File lib/rmail/address.rb, line 395 def address_lookahead lookahead = [] while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_PERIOD lookahead.push([@sym, @lexeme]) get end retval = @sym putback(@sym, @lexeme) putback_array(lookahead) get retval end
Parse this:
angleAddr = SYNC "<" [obsRoute] addrSpec SYNC ">"
# File lib/rmail/address.rb, line 548 def angle_addr expect(SYM_LESS_THAN) if @sym == SYM_AT_SIGN obs_route end addr_spec expect(SYM_GREATER_THAN) end
# File lib/rmail/address.rb, line 726 def comment depth = 0 comment = '' catch(:done) { while @string =~ /\A(\(([^\(\)\\]|\\.)*)/m @string = $' comment += $1 depth += 1 while @string =~ /\A(([^\(\)\\]|\\.)*\))/m @string = $' comment += $1 depth -= 1 throw :done if depth == 0 if @string =~ /\A(([^\(\)\\]|\\.)+)/ @string = $' comment += $1 end end end } comment = comment.gsub(/[\r\n\t ]+/m, ' '). sub(/\A\((.*)\)$/m, '\1'). gsub(/\\(.)/, '\1') @addresses.last.comments = (@addresses.last.comments || []) + [comment] end
Parse this:
word = atom | atom_non_ascii | quotedString
# File lib/rmail/address.rb, line 516 def display_name_word if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT save_text get else error "expected word, got #{@sym.inspect}" end end
Parse this:
domain = domainLiteral | obsDomain
# File lib/rmail/address.rb, line 559 def domain if @sym == SYM_DOMAIN_LITERAL save_text @addresses.last.domain = get_text get elsif @sym == SYM_ATOM obs_domain @addresses.last.domain = get_text else error "expected start of domain, got #{@sym.inspect}" end end
# File lib/rmail/address.rb, line 775 def error(s) @errors += 1 end
# File lib/rmail/address.rb, line 753 def expect(token) if @sym == token get else error("expected #{token.inspect} but got #{@sym.inspect}") end end
# File lib/rmail/address.rb, line 761 def expect_save(token) if @sym == token save_text end expect(token) end
Get a single token from the string or from the @tokens array if somebody used putback.
# File lib/rmail/address.rb, line 639 def get unless @tokens.empty? @sym, @lexeme = @tokens.pop else get_tokenize end end
Get the text that has been saved up to this point.
# File lib/rmail/address.rb, line 336 def get_text text = '' sep = '' @lexemes.each { |lexeme| if lexeme == '.' text << lexeme sep = '' else text << sep text << lexeme sep = ' ' end } @lexemes = [] text end
Get a single token from the string
# File lib/rmail/address.rb, line 648 def get_tokenize @lexeme = nil loop { case @string when nil # the end @sym = nil break when "" # the end @sym = nil break when /\A[\r\n\t ]+/m # skip whitespace @string = $' when /\A\(/m # skip comment comment when /\A""/ # skip empty quoted text @string = $' when /\A[\w!$%&\'*+\/=?^\`{\}|~#-]+/m @string = $' @sym = SYM_ATOM break when /\A"(.*?([^\\]|\\\\))"/m @string = $' @sym = SYM_QTEXT @lexeme = $1.gsub(/\\(.)/, '\1') break when /\A</ @string = $' @sym = SYM_LESS_THAN break when /\A>/ @string = $' @sym = SYM_GREATER_THAN break when /\A@/ @string = $' @sym = SYM_AT_SIGN break when /\A,/ @string = $' @sym = SYM_COMMA break when /\A:/ @string = $' @sym = SYM_COLON break when /\A;/ @string = $' @sym = SYM_SEMI_COLON break when /\A\./ @string = $' @sym = SYM_PERIOD break when /\A(\[.*?([^\\]|\\\\)\])/m @string = $' @sym = SYM_DOMAIN_LITERAL @lexeme = $1.gsub(/(^|[^\\])[\r\n\t ]+/, '\1').gsub(/\\(.)/, '\1') break when /\A[\200-\377\w!$%&\'*+\/=?^\`{\}|~#-]+/nm # This is just like SYM_ATOM, but includes all characters # with high bits. This is so we can allow such tokens in # the display name portion of an address even though it # violates the RFCs. @string = $' @sym = SYM_ATOM_NON_ASCII break when /\A./ @string = $' # garbage error('garbage character in string') else raise "internal error, @string is #{@string.inspect}" end } if @sym @lexeme ||= $& end end
Parse this:
group = word {word | "."} SYNC ":" [mailbox_list] SYNC ";"
# File lib/rmail/address.rb, line 492 def group word while @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_PERIOD if @sym == SYM_ATOM || @sym == SYM_QTEXT word else save_text get end end sync(SYM_COLON) expect(SYM_COLON) get_text # throw away group name @addresses.last.comments = nil if @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_COMMA || @sym == SYM_LESS_THAN mailbox_list end sync(SYM_SEMI_COLON) expect(SYM_SEMI_COLON) end
Parse this:
local_part = word *( "." word )
# File lib/rmail/address.rb, line 582 def local_part word while @sym == SYM_PERIOD save_text get word end @addresses.last.local = get_text end
Parse this:
mailbox = angleAddr | word {word | "."} angleAddr | word {"." word} "@" domain .
lookahead will be set to the return value of address_lookahead
, which will be ‘@’ or ‘<’ (or another value for invalid input)
# File lib/rmail/address.rb, line 443 def mailbox(lookahead) if @sym == SYM_LESS_THAN angle_addr elsif lookahead == SYM_LESS_THAN display_name_word while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_PERIOD if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT display_name_word else save_text get end end @addresses.last.display_name = get_text angle_addr else word while @sym == SYM_PERIOD save_text get word end @addresses.last.local = get_text expect(SYM_AT_SIGN) domain if @sym == SYM_LESS_THAN # Workaround for invalid input. Treat 'foo@bar <foo@bar>' as if it # were '"foo@bar" <foo@bar>'. The domain parser will eat # 'bar' but stop at '<'. At this point, we've been # parsing the display name as if it were an address, so we # throw the address into display_name and parse an # angle_addr. @addresses.last.display_name = format("%s@%s", @addresses.last.local, @addresses.last.domain) @addresses.last.local = nil @addresses.last.domain = nil angle_addr end end end
Parse a mailbox list.
# File lib/rmail/address.rb, line 537 def mailbox_list mailbox(address_lookahead) while @sym == SYM_COMMA get new_address mailbox(address_lookahead) end end
# File lib/rmail/address.rb, line 330 def new_address reset_errors @addresses.push(Address.new) end
Parse this:
obs_domain = atom *( "." atom ) .
# File lib/rmail/address.rb, line 594 def obs_domain expect_save(SYM_ATOM) while @sym == SYM_PERIOD save_text get expect_save(SYM_ATOM) end end
Parse this:
obs_domain_list = "@" domain *( *( "," ) "@" domain )
# File lib/rmail/address.rb, line 612 def obs_domain_list expect(SYM_AT_SIGN) domain while @sym == SYM_COMMA || @sym == SYM_AT_SIGN while @sym == SYM_COMMA get end expect(SYM_AT_SIGN) domain end end
Parse this:
obs_route = obs_domain_list ":"
# File lib/rmail/address.rb, line 605 def obs_route obs_domain_list expect(SYM_COLON) end
Put a token back into the input stream. This token will be retrieved by the next call to get.
# File lib/rmail/address.rb, line 626 def putback(sym, lexeme) @tokens.push([sym, lexeme]) end
Put back an array of tokens into the input stream.
# File lib/rmail/address.rb, line 631 def putback_array(a) a.reverse_each { |e| putback(*e) } end
# File lib/rmail/address.rb, line 323 def reset_errors if @errors > 0 @addresses.pop @errors = 0 end end
Save the current lexeme away for later retrieval with get_text.
# File lib/rmail/address.rb, line 355 def save_text @lexemes << @lexeme end
# File lib/rmail/address.rb, line 768 def sync(token) while @sym && @sym != token error "expected #{token.inspect} but got #{@sym.inspect}" get end end
Parse this:
word = atom | quotedString
# File lib/rmail/address.rb, line 527 def word if @sym == SYM_ATOM || @sym == SYM_QTEXT save_text get else error "expected word, got #{@sym.inspect}" end end