class HighLevelBrowse::DB

Constants

FILENAME

Hard-code filename. If you need more than one, put them in different directories

Public Class Methods

load(dir:) click to toggle source

Load from disk @param [String] dir The directory where the hlb.json.gz file is located @return [DB] The loaded database

# File lib/high_level_browse/db.rb, line 80
def self.load(dir:)
  simple_array_of_cnrs = Zlib::GzipReader.open(File.join(dir, FILENAME)) do |infile|
    JSON.load(infile.read).to_a
  end
  db                   = self.new(simple_array_of_cnrs)
  db.freeze
  db
end
new(array_of_ranges) click to toggle source

Given a bunch of CallNumberRange objects, create a new database with an efficient structure for querying @param [Array<HighLevelBrowse::CallNumberRange>] array_of_ranges

# File lib/high_level_browse/db.rb, line 15
def initialize(array_of_ranges)
  @all    = array_of_ranges
  @ranges = self.create_letter_indexed_ranges(@all)
end
new_from_xml(xml) click to toggle source

Create a new object from a string with the XML in it. @param [String] xml The contents of the HLB XML dump

(e.g., from 'https://www.lib.umich.edu/browse/categories/xml.php')

@return [DB]

# File lib/high_level_browse/db.rb, line 60
def self.new_from_xml(xml)
  oga_doc_root         = Oga.parse_xml(xml)
  simple_array_of_cnrs = cnrs_within_oga_node(node: oga_doc_root)
  self.new(simple_array_of_cnrs).freeze
end

Private Class Methods

call_numbers_list_from_leaves(node:, topic_array:) click to toggle source

Given a second-to-lowest-level node, get its topic and extract call number ranges from its children

# File lib/high_level_browse/db.rb, line 131
def self.call_numbers_list_from_leaves(node:, topic_array:)
  cnrs      = []
  new_topic = topic_array.dup.push node.get(:name)
  node.xpath('call-numbers').each do |cn_node|
    min = cn_node.get(:start)
    max = cn_node.get(:end)

    new_cnr = HighLevelBrowse::CallNumberRange.new(min: min, max: max, topic_array: new_topic)
    if new_cnr.illegal?
      # do some sort of logging
    else
      cnrs.push new_cnr
    end
  end
  cnrs

end
cnrs_within_oga_node(node:, decendent_xpaths: ['/hlb/subject', 'topic', 'sub-topic'], topic_array: []) click to toggle source

Recurse through the parsed XML document, at each stage keeping track of

* where we are (what are the xpath children?)
* what the current topics are ([level1, level2])

Get all the call numbers assocaited with the topic represented by the given node, as well as all the children of the given node, and send it back as a big ol' array @param [Oga::Node] node A node of the parsed HLB XML file @param [Array<String>] decendent_xpaths A list of xpaths to the decendents of this node @param [Array<String>] topic_array An array with all levels of the topics associated with this node @return [Array<HighLevelBrowse::CallNumberRange>]

# File lib/high_level_browse/db.rb, line 109
def self.cnrs_within_oga_node(node:, decendent_xpaths: ['/hlb/subject', 'topic', 'sub-topic'], topic_array: [])
  if decendent_xpaths.empty?
    [] # base case -- we're as low as we're going to go
  else
    current_xpath_component = decendent_xpaths[0]
    new_xpath               = decendent_xpaths[1..-1]
    new_topic               = topic_array.dup
    new_topic.push node.get(:name) unless node == node.root_node # skip the root
    cnrs = []
    # For each sub-component, get both the call-number-ranges (cnrs) assocaited
    # with this level, as well as recusively getting from all the children
    node.xpath(current_xpath_component).each do |c|
      cnrs += call_numbers_list_from_leaves(node: c, topic_array: new_topic)
      cnrs += cnrs_within_oga_node(node: c, decendent_xpaths: new_xpath, topic_array: new_topic)
    end
    cnrs
  end
end

Public Instance Methods

[](*raw_callnumber_strings)
Alias for: topics
create_letter_indexed_ranges(all) click to toggle source

Given an array of ranges, create efficient search structures @private

# File lib/high_level_browse/db.rb, line 23
def create_letter_indexed_ranges(all)
  bins = {}
  ('A'..'Z').each do |letter|
    cnrs         = all.find_all {|x| x.firstletter == letter}
    bins[letter] = HighLevelBrowse::CallNumberRangeSet.new(cnrs)
  end
  bins
end
freeze() click to toggle source

Freeze everything @return [DB] the frozen db

# File lib/high_level_browse/db.rb, line 92
def freeze
  @ranges.freeze
  @all.freeze
  self
end
save(dir:) click to toggle source

Save to disk @param [String] dir The directory where the hlb.json.gz file will be saved @return [DB] The loaded database

# File lib/high_level_browse/db.rb, line 70
def save(dir:)
  Zlib::GzipWriter.open(File.join(dir, FILENAME)) do |out|
    out.puts JSON.fast_generate(@all)
  end
end
topics(*raw_callnumber_strings) click to toggle source

Get the topic arrays associated with this callnumber of the form:

[
  [toplevel, secondlevel],
  [toplevel, secondlevel, thirdlevel],
   ...
]

@param [String] raw_callnumber_string @return [Array<Array>] A (possibly empty) array of arrays of topics

# File lib/high_level_browse/db.rb, line 41
def topics(*raw_callnumber_strings)
  raw_callnumber_strings.reduce([]) do |acc, raw_callnumber_string|
    firstletter = raw_callnumber_string.strip.upcase[0]
    if @ranges.has_key? firstletter
      acc + @ranges[firstletter].topics_for(raw_callnumber_string)
    else
      acc
    end
  end.uniq
end
Also aliased as: []