class Milemarker

milemarker class, to keep track of progress over time for long-running iterating processes

@author Bill Dueber <bill@dueber.com>

Constants

VERSION

Attributes

batch_end_time[R]

@return [Time] Time the last batch ended processing

batch_number[R]

@return [Integer] which batch number (total increment / batch_size)

batch_size[RW]

@return [Integer] batch size for computing ‘on_batch` calls

batch_start_time[R]

@return [Time] Time the last batch started processing

count[R]

@return [Integer] Total records (really, increments) for the full run

last_batch_seconds[R]

@return [Integer] number of second to process the last batch

last_batch_size[R]

@return [Integer] number of records (really, number of increments) in the last batch

logger[RW]

@return [Logger, info] logging object for automatic logging methods

name[RW]

@return [String] optional “name” of this milemarker, for logging purposes

prev_count[R]

@return [Integer] Total count at the time of the last on_batch call. Used to figure out

how many records were in the final batch
start_time[R]

@return [Time] Time the full process started

Public Class Methods

new(batch_size: 1000, name: nil, logger: nil) click to toggle source

Create a new milemarker tracker, with an optional name and logger @param [Integer] batch_size How often the on_batch block will be called @param [String] name Optional “name” for this milemarker, included in the generated log lines @param [Logger, info, warn] Optional logger that responds to the normal info, warn, etc.

# File lib/milemarker.rb, line 52
def initialize(batch_size: 1000, name: nil, logger: nil)
  @batch_size = batch_size
  @name       = name
  @logger     = logger

  @batch_number = 0
  @last_batch_size    = 0
  @last_batch_seconds = 0

  @start_time       = Time.now
  @batch_start_time = @start_time
  @batch_end_time   = @start_time

  @count      = 0
  @prev_count = 0
end

Public Instance Methods

_increment_and_on_batch(&blk) click to toggle source

Single call to increment and run (if needed) the on_batch block

# File lib/milemarker.rb, line 107
def _increment_and_on_batch(&blk)
  incr.on_batch(&blk)
end
Also aliased as: increment_and_on_batch
batch_count_so_far()
Alias for: final_batch_size
batch_line() click to toggle source

A line describing the batch suitable for logging, of the form

load records.ndj   8_000_000. This batch 2_000_000 in 26.2s (76_469 r/s). Overall 72_705 r/s.

@return [String] The batch log line

# File lib/milemarker.rb, line 142
def batch_line
  # rubocop:disable Layout/LineLength
  "#{name} #{ppnum(count, 10)}. This batch #{ppnum(last_batch_size, 5)} in #{ppnum(last_batch_seconds, 4, 1)}s (#{batch_rate_str} r/s). Overall #{total_rate_str} r/s."
  # rubocop:enable Layout/LineLength
end
batch_rate() click to toggle source

@return [Float] rate of the last batch (in recs/second)

# File lib/milemarker.rb, line 169
def batch_rate
  return 0.0 if count.zero?

  last_batch_size.to_f / last_batch_seconds
end
batch_rate_str(decimals = 0) click to toggle source

@param [Integer] decimals Number of decimal places to the right of the

decimal point

@return [String] Rate-per-second in form XXX.YY

# File lib/milemarker.rb, line 178
def batch_rate_str(decimals = 0)
  ppnum(batch_rate, 0, decimals)
end
batch_seconds_so_far() click to toggle source

Total seconds since this batch started @return [Float] seconds since the beginning of this batch

# File lib/milemarker.rb, line 204
def batch_seconds_so_far
  Time.now - batch_start_time
end
create_logger!(*args, **kwargs) click to toggle source

Create a logger for use in logging milemaker information @example mm.create_logger!(STDOUT) @return [Milemarker] self

# File lib/milemarker.rb, line 92
def create_logger!(*args, **kwargs)
  @logger = Logger.new(*args, **kwargs)
  self
end
final_batch_size() click to toggle source

Record how many increments there have been since the last on_batch call. Most useful to count how many items are in the final (usually incomplete) batch Note that since Milemarker can’t tell when you’re done processing, you can call this anytime and get the number of items processed since the last on_batch call. @return [Integer] Number of items processed in the final batch

# File lib/milemarker.rb, line 153
def final_batch_size
  count - prev_count
end
Also aliased as: batch_count_so_far
final_line() click to toggle source

A line describing the entire run, suitable for logging, of the form

load records.ndj FINISHED. 27_138_118 total records in 00h 12m 39s. Overall 35_718 r/s.

@return [String] The full log line

# File lib/milemarker.rb, line 162
def final_line
  # rubocop:disable Layout/LineLength
  "#{name} FINISHED. #{ppnum(count, 10)} total records in #{seconds_to_time_string(total_seconds_so_far)}. Overall #{total_rate_str} r/s."
  # rubocop:enable Layout/LineLength
end
incr(increase = 1) click to toggle source

Increment the counter – how many records processed, e.g. @return [Milemarker] self

# File lib/milemarker.rb, line 82
def incr(increase = 1)
  @count += increase
  self
end
Also aliased as: increment
increment(increase = 1)
Alias for: incr
increment_and_log_batch_line(level: :info) click to toggle source

Convenience method, exactly the same as the common idiom

`mm.incr; mm.on_batch {|mm| log.info mm.batch_line}`

@param [Symbol] level The level to log at

# File lib/milemarker.rb, line 123
def increment_and_log_batch_line(level: :info)
  increment_and_on_batch { log_batch_line(level: level) }
end
increment_and_on_batch(&blk)
log(msg, level: :info) click to toggle source

Log a line using the internal logger. Do nothing if no logger is configured. @param [String] msg The message to log @param [Symbol] level The level to log at

# File lib/milemarker.rb, line 229
def log(msg, level: :info)
  logger&.send(level, msg)
end
log_batch_line(level: :info) click to toggle source

Log the batch line, as described in batch_line @param [Symbol] level The level to log at

# File lib/milemarker.rb, line 129
def log_batch_line(level: :info)
  log(batch_line, level: level)
end
log_final_line(level: :info) click to toggle source

Log the final line, as described in final_line @param [Symbol] level The level to log at

# File lib/milemarker.rb, line 135
def log_final_line(level: :info)
  log(final_line, level: level)
end
on_batch() { |self| ... } click to toggle source

Run the given block if we’ve exceeded the batch size for the current batch @yield [Milemarker] self

# File lib/milemarker.rb, line 99
def on_batch
  if batch_size_exceeded?
    set_milemarker!
    yield self
  end
end
reset_for_next_batch!() click to toggle source

Reset the internal counters/timers at the end of a batch. Taken care of by on_batch; should probably not be called manually.

# File lib/milemarker.rb, line 220
def reset_for_next_batch!
  @batch_start_time  = batch_end_time
  @prev_count        = count
  @batch_number = batch_divisor
end
set_milemarker!() click to toggle source

Set/reset all the internal state. Called by on_batch when necessary; should probably not be called manually

# File lib/milemarker.rb, line 210
def set_milemarker!
  @batch_end_time     = Time.now
  @last_batch_size    = @count - @prev_count
  @last_batch_seconds = @batch_end_time - @batch_start_time

  reset_for_next_batch!
end
threadsafe_increment_and_on_batch(&blk) click to toggle source

Threadsafe version of increment_and_on_batch, doing the whole thing as a single atomic action

# File lib/milemarker.rb, line 114
def threadsafe_increment_and_on_batch(&blk)
  @mutex.synchronize do
    _increment_and_on_batch(&blk)
  end
end
threadsafify!() click to toggle source

Turn ‘increment_and_batch` (and thus `increment_and_log_batch_line`) into a threadsafe version @return [Milemarker] self

# File lib/milemarker.rb, line 72
def threadsafify!
  @mutex = Mutex.new
  define_singleton_method(:increment_and_on_batch) do |&blk|
    threadsafe_increment_and_on_batch(&blk)
  end
  self
end
total_rate() click to toggle source

@return [Float] total rate so far (in rec/second)

# File lib/milemarker.rb, line 183
def total_rate
  return 0.0 if @count.zero?

  count / total_seconds_so_far
end
total_rate_str(decimals = 0) click to toggle source

@param [Integer] decimals Number of decimal places to the right of the

decimal point

@return [String] Rate-per-second in form XXX.YY

# File lib/milemarker.rb, line 192
def total_rate_str(decimals = 0)
  ppnum(total_rate, 0, decimals)
end
total_seconds_so_far() click to toggle source

Total seconds since the beginning of this milemarker @return [Float] seconds since the milemarker was created

# File lib/milemarker.rb, line 198
def total_seconds_so_far
  Time.now - start_time
end

Private Instance Methods

batch_divisor() click to toggle source
# File lib/milemarker.rb, line 239
def batch_divisor
  count.div batch_size
end
batch_size_exceeded?() click to toggle source
# File lib/milemarker.rb, line 235
def batch_size_exceeded?
  batch_divisor > @batch_number
end
seconds_to_time_string(sec) click to toggle source
# File lib/milemarker.rb, line 243
def seconds_to_time_string(sec)
  hours, leftover = sec.divmod(3600)
  minutes, secs   = leftover.divmod(60)
  format("%02dh %02dm %02ds", hours, minutes, secs)
end