class Moab::FileGroupDifference

Performs analysis and reports the differences between two matching {FileGroup} objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of {FileInventoryDifference}, the documentation of which contains a full example.

In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.

For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of {FileSignature} object used as hash keys, and the corresponding {FileInstance} arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using {Array} operators:

For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:

Data Model

@note Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University.

All rights reserved.  See {file:LICENSE.rdoc} for details.

Attributes

subset_hash[RW]

@return [Hash<Symbol,FileGroupDifferenceSubset>] A set of containers (one for each change type),

each of which contains a collection of file-level differences having that change type.

Public Class Methods

new(opts = {}) click to toggle source

(see Serializable#initialize)

Calls superclass method Serializer::Serializable::new
# File lib/moab/file_group_difference.rb, line 55
def initialize(opts = {})
  @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(change: key.to_s) }
  super(opts)
end

Public Instance Methods

added() click to toggle source
# File lib/moab/file_group_difference.rb, line 115
def added
  subset_hash[:added].count
end
basis_only_keys(basis_hash, other_hash) click to toggle source

@api internal @param (see matching_keys) @return [Array] Compare the keys of two hashes and return the keys unique to the first hash

# File lib/moab/file_group_difference.rb, line 172
def basis_only_keys(basis_hash, other_hash)
  basis_hash.keys - other_hash.keys
end
compare_file_groups(basis_group, other_group) click to toggle source

@api internal @param basis_group [FileGroup] The file group that is the basis of the comparison @param other_group [FileGroup] The file group that is compared against the basis group @return [FileGroupDifference] Compare two file groups and return a differences report

# File lib/moab/file_group_difference.rb, line 187
def compare_file_groups(basis_group, other_group)
  @group_id = basis_group.group_id
  compare_matching_signatures(basis_group, other_group)
  compare_non_matching_signatures(basis_group, other_group)
  self
end
compare_matching_signatures(basis_group, other_group) click to toggle source

@api internal @param (see compare_file_groups) @return [FileGroupDifference] For signatures that are present in both groups,

report which file instances are identical or renamed
# File lib/moab/file_group_difference.rb, line 198
def compare_matching_signatures(basis_group, other_group)
  matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash)
  tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  self
end
compare_non_matching_signatures(basis_group, other_group) click to toggle source

@api internal @param (see compare_file_groups) @return [FileGroupDifference] For signatures that are present in only one or the other group,

report which file instances are modified, deleted, or added
# File lib/moab/file_group_difference.rb, line 209
def compare_non_matching_signatures(basis_group, other_group)
  basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash)
  other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash)
  basis_path_hash = basis_group.path_hash_subset(basis_only_signatures)
  other_path_hash = other_group.path_hash_subset(other_only_signatures)
  tabulate_modified_files(basis_path_hash, other_path_hash)
  tabulate_added_files(basis_path_hash, other_path_hash)
  tabulate_deleted_files(basis_path_hash, other_path_hash)
  self
end
copyadded() click to toggle source
# File lib/moab/file_group_difference.rb, line 87
def copyadded
  subset_hash[:copyadded].count
end
copydeleted() click to toggle source
# File lib/moab/file_group_difference.rb, line 94
def copydeleted
  subset_hash[:copydeleted].count
end
deleted() click to toggle source
# File lib/moab/file_group_difference.rb, line 122
def deleted
  subset_hash[:deleted].count
end
difference_count() click to toggle source
# File lib/moab/file_group_difference.rb, line 69
def difference_count
  count = 0
  subset_hash.each do |type, subset|
    count += subset.count if type != :identical
  end
  count
end
file_deltas() click to toggle source

@return [Hash<Symbol,Array>] Sets of filenames grouped by change type for use in performing file or metadata operations

# File lib/moab/file_group_difference.rb, line 334
def file_deltas
  # The hash to be returned
  deltas = Hash.new { |hash, key| hash[key] = [] }
  # case where other_path is empty or 'same'.  (create array of strings)
  %i[identical modified deleted copydeleted].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:basis_path))
  end
  # case where basis_path and other_path are both present.  (create array of arrays)
  %i[copyadded renamed].each do |change|
    deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] })
  end
  # case where basis_path is empty.  (create array of strings)
  [:added].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:other_path))
  end
  deltas
end
identical() click to toggle source
# File lib/moab/file_group_difference.rb, line 80
def identical
  subset_hash[:identical].count
end
matching_keys(basis_hash, other_hash) click to toggle source

@api internal @param basis_hash [Hash] The first hash being compared @param other_hash [Hash] The second hash being compared @return [Array] Compare the keys of two hashes and return the intersection

# File lib/moab/file_group_difference.rb, line 165
def matching_keys(basis_hash, other_hash)
  basis_hash.keys & other_hash.keys
end
modified() click to toggle source
# File lib/moab/file_group_difference.rb, line 108
def modified
  subset_hash[:modified].count
end
other_only_keys(basis_hash, other_hash) click to toggle source

@api internal @param (see matching_keys) @return [Array] Compare the keys of two hashes and return the keys unique to the second hash

# File lib/moab/file_group_difference.rb, line 179
def other_only_keys(basis_hash, other_hash)
  other_hash.keys - basis_hash.keys
end
rename_require_temp_files(filepairs) click to toggle source

@param [Array<Array<String>>] filepairs The set of oldname, newname pairs for all files being renamed @return [Boolean] Test whether any of the new names are the same as one of the old names,

such as would be true for insertion of a new file into a page sequence, or a circular rename.
In such a case, return true, indicating that use of intermediate temporary files would be required
when updating a copy of an object's files at a given location.
# File lib/moab/file_group_difference.rb, line 357
def rename_require_temp_files(filepairs)
  # Split the filepairs into two arrays
  oldnames = []
  newnames = []
  filepairs.each do |old, new|
    oldnames << old
    newnames << new
  end
  # Are any of the filenames the same in set of oldnames and set of newnames?
  intersection = oldnames & newnames
  intersection.count > 0
end
rename_tempfile_triplets(filepairs) click to toggle source

@param [Array<Array<String>>] filepairs The set of oldname, newname pairs for all files being renamed @return [Array<Array<String>>] a set of file triples containing oldname, tempname, newname

# File lib/moab/file_group_difference.rb, line 372
def rename_tempfile_triplets(filepairs)
  filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] }
end
renamed() click to toggle source
# File lib/moab/file_group_difference.rb, line 101
def renamed
  subset_hash[:renamed].count
end
subset(change) click to toggle source

@param change [String] the change type to search for @return [FileGroupDifferenceSubset] Find a specified subset of changes

# File lib/moab/file_group_difference.rb, line 50
def subset(change)
  subset_hash[change.to_sym]
end
subsets() click to toggle source
# File lib/moab/file_group_difference.rb, line 131
def subsets
  subset_hash.values
end
subsets=(array) click to toggle source
# File lib/moab/file_group_difference.rb, line 135
def subsets=(array)
  return unless array

  array.each { |subset| subset_hash[subset.change.to_sym] = subset }
end
summary() click to toggle source

@api internal @return [FileGroupDifference] Clone just this element for inclusion in a versionMetadata structure

# File lib/moab/file_group_difference.rb, line 148
def summary
  FileGroupDifference.new(
    group_id: group_id,
    identical: identical,
    copyadded: copyadded,
    copydeleted: copydeleted,
    renamed: renamed,
    modified: modified,
    added: added,
    deleted: deleted
  )
end
summary_fields() click to toggle source

@return [Array<String>] The data fields to include in summary reports

# File lib/moab/file_group_difference.rb, line 142
def summary_fields
  %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added]
end
tabulate_added_files(basis_path_hash, other_path_hash) click to toggle source

@api internal @param basis_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the basis group

@param other_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the other group

@return [FileGroupDifference]

Container for reporting the set of file-level differences of type 'added'
# File lib/moab/file_group_difference.rb, line 304
def tabulate_added_files(basis_path_hash, other_path_hash)
  other_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(change: 'added')
    fid.basis_path = ''
    fid.other_path = path
    fid.signatures << other_path_hash[path]
    subset_hash[:added].files << fid
  end
  self
end
tabulate_deleted_files(basis_path_hash, other_path_hash) click to toggle source

@api internal @param basis_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the basis group

@param other_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the other group

@return [FileGroupDifference]

Container for reporting the set of file-level differences of type 'deleted'
# File lib/moab/file_group_difference.rb, line 322
def tabulate_deleted_files(basis_path_hash, other_path_hash)
  basis_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(change: 'deleted')
    fid.basis_path = path
    fid.other_path = ''
    fid.signatures << basis_path_hash[path]
    subset_hash[:deleted].files << fid
  end
  self
end
tabulate_modified_files(basis_path_hash, other_path_hash) click to toggle source

@api internal @param basis_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the basis group

@param other_path_hash [Hash<String,FileSignature>]

The file paths and associated signatures for manifestations appearing only in the other group

@return [FileGroupDifference]

Container for reporting the set of file-level differences of type 'modified'
# File lib/moab/file_group_difference.rb, line 285
def tabulate_modified_files(basis_path_hash, other_path_hash)
  matching_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(change: 'modified')
    fid.basis_path = path
    fid.other_path = 'same'
    fid.signatures << basis_path_hash[path]
    fid.signatures << other_path_hash[path]
    subset_hash[:modified].files << fid
  end
  self
end
tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) click to toggle source

@api internal @param matching_signatures [Array<FileSignature>] The file signature of the file manifestations being compared @param basis_signature_hash [Hash<FileSignature, FileManifestation>]

Signature to file path mapping from the file group that is the basis of the comparison

@param other_signature_hash [Hash<FileSignature, FileManifestation>]

Signature to file path mapping from the file group that is the being compared to the basis group

@return [FileGroupDifference]

Container for reporting the set of file-level differences of type 'renamed','copyadded', or 'copydeleted'
# File lib/moab/file_group_difference.rb, line 252
def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    basis_only_paths = basis_paths - other_paths
    other_only_paths = other_paths - basis_paths
    maxsize = [basis_only_paths.size, other_only_paths.size].max
    (0..maxsize - 1).each do |n|
      fid = FileInstanceDifference.new
      fid.basis_path = basis_only_paths[n]
      fid.other_path = other_only_paths[n]
      fid.signatures << signature
      if fid.basis_path.nil?
        fid.change = 'copyadded'
        fid.basis_path = basis_paths[0]
      elsif fid.other_path.nil?
        fid.change = 'copydeleted'
      else
        fid.change = 'renamed'
      end
      subset_hash[fid.change.to_sym].files << fid
    end
  end
  self
end
tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) click to toggle source

@api internal @param matching_signatures [Array<FileSignature>] The file signature of the file manifestations being compared @param basis_signature_hash [Hash<FileSignature, FileManifestation>]

Signature to file path mapping from the file group that is the basis of the comparison

@param other_signature_hash [Hash<FileSignature, FileManifestation>]

Signature to file path mapping from the file group that is the being compared to the basis group

@return [FileGroupDifference]

Container for reporting the set of file-level differences of type 'identical'
# File lib/moab/file_group_difference.rb, line 228
def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    matching_paths = basis_paths & other_paths
    matching_paths.each do |path|
      fid = FileInstanceDifference.new(change: 'identical')
      fid.basis_path = path
      fid.other_path = 'same'
      fid.signatures << signature
      subset_hash[:identical].files << fid
    end
  end
  self
end