module Krikri::Util::ExtendedDateParser

Utilities to parse string values into EDTF dates or Intervals.

Public Instance Methods

circa(str) click to toggle source

Remove 'circa' or 'about' or variations and return an uncertian ETDF dates.

@param str [String] @return [Date, nil] an EDTF date, marked uncertian; or `nil` @see parse

# File lib/krikri/util/extended_date_parser.rb, line 86
def circa(str)
  run = str.gsub!(/.*c[irca\.]*/i, '')
  run ||= str.gsub!(/.*about/i, '')
  date = parse(str) if run

  return nil if date.nil?

  # The EDTF grammar does not support uncertainty on masked precision dates
  if date.respond_to? :uncertain!
    date.uncertain!
  elsif date.is_a? EDTF::Interval
    # Interval uncertainty is scoped to the begin and end dates;
    # to be safe, we mark both.
    date.from = date.from.uncertain! if date.from.respond_to? :uncertain!
    date.to   = date.to.uncertain!   if date.to.respond_to? :uncertain!
  end

  date
end
decade_hyphen(str) click to toggle source

e.g. 199-

# File lib/krikri/util/extended_date_parser.rb, line 189
def decade_hyphen(str)
  /^(\d{3})-$/.match(str) do |m|
    Date.edtf("#{m[1]}x")
  end
end
decade_s(str) click to toggle source

e.g. 1990s

# File lib/krikri/util/extended_date_parser.rb, line 181
def decade_s(str)
  /^(\d{3})0s$/.match(str) do |m|
    Date.edtf("#{m[1]}x")
  end
end
hyphenated_partial_range(str) click to toggle source

e.g. 1990-92

# File lib/krikri/util/extended_date_parser.rb, line 165
def hyphenated_partial_range(str)
  /^(\d{2})(\d{2})-(\d{2})$/.match(str) do |m|
    Date.edtf("#{m[1]}#{m[2]}/#{m[1]}#{m[3]}")
  end
end
month_year(str) click to toggle source

e.g. 01-2045

# File lib/krikri/util/extended_date_parser.rb, line 157
def month_year(str)
  /^(\d{2})-(\d{4})$/.match(str) do |m|
    Date.edtf("#{m[2]}-#{m[1]}")
  end
end
parse(date_str, allow_interval = false) click to toggle source

Attempts to parse a string into a valid EDTF or `Date` format.

- Attempts to split `#providedLabel` on '-', '/', '..', 'to', 'until', and
  looks for EDTF and `Date.parse` patterns on either side, setting them to
  `#begin` and `#end`. Both split and unsplit dates are parsed as follows:
- Attempts to parse `#providedLabel` as an EDTF interval and populates
  begin and end with their respective values.
- Attempts to match to a number of regular expressions which specify
  ranges informally.
- Attempts to parse `#providedLabel` as a single date value with
  `Date.parse` and enters that value to both `#begin` and `#end`.

@param date_str [String] a string which may contain a date range @param allow_interval [Boolean] a flag specifing whethe to use

#range_match to look for range values.

@return [Date, EDTF::Epoch, EDTF::Interval, nil] the date parsed or nil @see www.loc.gov/standards/datetime/

# File lib/krikri/util/extended_date_parser.rb, line 26
def parse(date_str, allow_interval = false)
  str = preprocess(date_str.dup)
  date = parse_interval(str) if allow_interval
  date ||= parse_m_d_y(str)
  date ||= Date.edtf(str.gsub('.', '-'))
  date ||= partial_edtf(str)
  date ||= decade_hyphen(str)
  date ||= month_year(str)
  date ||= decade_s(str)
  date ||= hyphenated_partial_range(str)
  date ||= parse_date(str)
  # Only do this if certian letters are present to avoid infinite loops.
  date ||= circa(str) if str.match(/[circabout]/i)
  date = date.first if date.is_a? EDTF::Set
  date || nil
end
parse_date(*args) click to toggle source

Runs `Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.

@return [Date, nil] the parsed date or nil @see Date#parse

# File lib/krikri/util/extended_date_parser.rb, line 132
def parse_date(*args)
  begin
    Date.parse(*args)
  rescue ArgumentError
    nil
  end
end
parse_interval(str) click to toggle source

Creates an EDTF::Interval from a string

@param str [String] a string which may contain a date range @return [ETDF::Interval, nil] an EDTF object representing a date range

or nil if none can be found

@see range_match

# File lib/krikri/util/extended_date_parser.rb, line 114
def parse_interval(str)
  match = range_match(str)
  return nil if match.nil?

  begin_date, end_date = match.map { |date| parse(date) || :unknown }

  begin_date = begin_date.first if begin_date.respond_to? :first
  end_date = end_date.last if end_date.respond_to? :last

  EDTF::Interval.new(begin_date, end_date)
end
parse_m_d_y(value) click to toggle source

Runs `Date#strptime` with '%m-%d-%Y'; if arguments are invalid (as with an invalid date string) returns `nil`.

@param value [String] the string to parse @return [Date, nil] the parsed date or nil @see Date#strptime

# File lib/krikri/util/extended_date_parser.rb, line 147
def parse_m_d_y(value)
  begin
    Date.strptime(value.gsub(/[^0-9]/, '-'), '%m-%d-%Y')
  rescue ArgumentError
    nil
  end
end
partial_edtf(str) click to toggle source

e.g. 1970-08-01/02 or 1970-12/10

# File lib/krikri/util/extended_date_parser.rb, line 173
def partial_edtf(str)
  /^(\d{4}(-\d{2})*)-(\d{2})\/(\d{2})$/.match(str) do |m|
    Date.edtf("#{m[1]}-#{m[3]}/#{m[1]}-#{m[4]}")
  end
end
preprocess(str) click to toggle source

Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.

@todo should '-` be intepreted as 'x' or '?' @see www.loc.gov/standards/datetime/pre-submission.html#maskedprecision

# File lib/krikri/util/extended_date_parser.rb, line 69
def preprocess(str)
  str.gsub!(/late/i, '')
  str.gsub!(/early/i, '')
  str.strip!
  str.gsub!(/\s+/, ' ')
  str.gsub!('0s', 'x') if str.match(/^[1-9]+0s$/)
  str.gsub!('-', 'x') if str.match(/^[1-9]+\-+$/)
  str
end
range_match(str) click to toggle source

Matches a wide variety of date ranges separated by '..' or '-'

@param str [String] a string which may contain a date range @return [Array(String)] the begining and ending dates of an identified

range
# File lib/krikri/util/extended_date_parser.rb, line 49
def range_match(str)
  str = str.gsub('to', '-').gsub('until', '-')
  regexp = %r{
    ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s?
    \d{3}[\d\-xs][s\d\-\.xu\?\~]*)
    \s*[-\.]+\s*
    ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s?
    \d{3}[\d\-xs][s\d\-\.xu\?\~]*)
  }x
  regexp.match(str) do |m|
    [m[1], m[2]]
  end
end