class Moab::FileGroupDifference
Performs analysis and reports the differences between two matching {FileGroup} objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of {FileInventoryDifference}, the documentation of which contains a full example.
In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.
For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of {FileSignature} object used as hash keys, and the corresponding {FileInstance} arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using {Array} operators:
-
matching = basis_array & other_array
-
basis_only = basis_array - other_array
-
other_only = other_array - basis_array
For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:
-
identical = signature and file path is the same in both basis and other file group
-
renamed = signature is unchanged, but the path has moved
-
copyadded = duplicate copy of file was added
-
copydeleted = duplicate copy of file was deleted
-
modified = path is same in both groups, but the signature has changed
-
added = signature and path are only in the other inventor
-
deleted = signature and path are only in the basis inventory
Data Model¶ ↑
-
{FileInventoryDifference} = compares two {FileInventory} instances based on file signatures and pathnames
-
{FileGroupDifference} [1..*] = performs analysis and reports differences between two matching {FileGroup} objects
-
{FileGroupDifferenceSubset} [1..5] = collects a set of file-level differences of a give change type
-
{FileInstanceDifference} [1..*] = contains difference information at the file level
-
{FileSignature} [1..2] = contains the file signature(s) of two file instances being compared
-
-
-
-
@note Copyright © 2012 by The Board of Trustees of the Leland Stanford
Junior University.
All rights reserved. See {file:LICENSE.rdoc} for details.
Attributes
@return [Hash<Symbol,FileGroupDifferenceSubset>] A set of containers (one for each change type),
each of which contains a collection of file-level differences having that change type.
Public Class Methods
(see Serializable#initialize)
Serializer::Serializable::new
# File lib/moab/file_group_difference.rb, line 55 def initialize(opts = {}) @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(change: key.to_s) } super(opts) end
Public Instance Methods
# File lib/moab/file_group_difference.rb, line 115 def added subset_hash[:added].count end
@api internal @param (see matching_keys
) @return [Array] Compare the keys of two hashes and return the keys unique to the first hash
# File lib/moab/file_group_difference.rb, line 172 def basis_only_keys(basis_hash, other_hash) basis_hash.keys - other_hash.keys end
@api internal @param basis_group [FileGroup] The file group that is the basis of the comparison @param other_group [FileGroup] The file group that is compared against the basis group @return [FileGroupDifference] Compare two file groups and return a differences report
# File lib/moab/file_group_difference.rb, line 187 def compare_file_groups(basis_group, other_group) @group_id = basis_group.group_id compare_matching_signatures(basis_group, other_group) compare_non_matching_signatures(basis_group, other_group) self end
@api internal @param (see compare_file_groups
) @return [FileGroupDifference] For signatures that are present in both groups,
report which file instances are identical or renamed
# File lib/moab/file_group_difference.rb, line 198 def compare_matching_signatures(basis_group, other_group) matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash) tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) self end
@api internal @param (see compare_file_groups
) @return [FileGroupDifference] For signatures that are present in only one or the other group,
report which file instances are modified, deleted, or added
# File lib/moab/file_group_difference.rb, line 209 def compare_non_matching_signatures(basis_group, other_group) basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash) other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash) basis_path_hash = basis_group.path_hash_subset(basis_only_signatures) other_path_hash = other_group.path_hash_subset(other_only_signatures) tabulate_modified_files(basis_path_hash, other_path_hash) tabulate_added_files(basis_path_hash, other_path_hash) tabulate_deleted_files(basis_path_hash, other_path_hash) self end
# File lib/moab/file_group_difference.rb, line 87 def copyadded subset_hash[:copyadded].count end
# File lib/moab/file_group_difference.rb, line 94 def copydeleted subset_hash[:copydeleted].count end
# File lib/moab/file_group_difference.rb, line 122 def deleted subset_hash[:deleted].count end
# File lib/moab/file_group_difference.rb, line 69 def difference_count count = 0 subset_hash.each do |type, subset| count += subset.count if type != :identical end count end
@return [Hash<Symbol,Array>] Sets of filenames grouped by change type for use in performing file or metadata operations
# File lib/moab/file_group_difference.rb, line 334 def file_deltas # The hash to be returned deltas = Hash.new { |hash, key| hash[key] = [] } # case where other_path is empty or 'same'. (create array of strings) %i[identical modified deleted copydeleted].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:basis_path)) end # case where basis_path and other_path are both present. (create array of arrays) %i[copyadded renamed].each do |change| deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] }) end # case where basis_path is empty. (create array of strings) [:added].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:other_path)) end deltas end
# File lib/moab/file_group_difference.rb, line 80 def identical subset_hash[:identical].count end
@api internal @param basis_hash [Hash] The first hash being compared @param other_hash [Hash] The second hash being compared @return [Array] Compare the keys of two hashes and return the intersection
# File lib/moab/file_group_difference.rb, line 165 def matching_keys(basis_hash, other_hash) basis_hash.keys & other_hash.keys end
# File lib/moab/file_group_difference.rb, line 108 def modified subset_hash[:modified].count end
@api internal @param (see matching_keys
) @return [Array] Compare the keys of two hashes and return the keys unique to the second hash
# File lib/moab/file_group_difference.rb, line 179 def other_only_keys(basis_hash, other_hash) other_hash.keys - basis_hash.keys end
@param [Array<Array<String>>] filepairs The set of oldname, newname pairs for all files being renamed @return [Boolean] Test whether any of the new names are the same as one of the old names,
such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object's files at a given location.
# File lib/moab/file_group_difference.rb, line 357 def rename_require_temp_files(filepairs) # Split the filepairs into two arrays oldnames = [] newnames = [] filepairs.each do |old, new| oldnames << old newnames << new end # Are any of the filenames the same in set of oldnames and set of newnames? intersection = oldnames & newnames intersection.count > 0 end
@param [Array<Array<String>>] filepairs The set of oldname, newname pairs for all files being renamed @return [Array<Array<String>>] a set of file triples containing oldname, tempname, newname
# File lib/moab/file_group_difference.rb, line 372 def rename_tempfile_triplets(filepairs) filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] } end
# File lib/moab/file_group_difference.rb, line 101 def renamed subset_hash[:renamed].count end
@param change [String] the change type to search for @return [FileGroupDifferenceSubset] Find a specified subset of changes
# File lib/moab/file_group_difference.rb, line 50 def subset(change) subset_hash[change.to_sym] end
# File lib/moab/file_group_difference.rb, line 131 def subsets subset_hash.values end
# File lib/moab/file_group_difference.rb, line 135 def subsets=(array) return unless array array.each { |subset| subset_hash[subset.change.to_sym] = subset } end
@api internal @return [FileGroupDifference] Clone just this element for inclusion in a versionMetadata structure
# File lib/moab/file_group_difference.rb, line 148 def summary FileGroupDifference.new( group_id: group_id, identical: identical, copyadded: copyadded, copydeleted: copydeleted, renamed: renamed, modified: modified, added: added, deleted: deleted ) end
@return [Array<String>] The data fields to include in summary reports
# File lib/moab/file_group_difference.rb, line 142 def summary_fields %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added] end
@api internal @param basis_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the basis group
@param other_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the other group
@return [FileGroupDifference]
Container for reporting the set of file-level differences of type 'added'
# File lib/moab/file_group_difference.rb, line 304 def tabulate_added_files(basis_path_hash, other_path_hash) other_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(change: 'added') fid.basis_path = '' fid.other_path = path fid.signatures << other_path_hash[path] subset_hash[:added].files << fid end self end
@api internal @param basis_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the basis group
@param other_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the other group
@return [FileGroupDifference]
Container for reporting the set of file-level differences of type 'deleted'
# File lib/moab/file_group_difference.rb, line 322 def tabulate_deleted_files(basis_path_hash, other_path_hash) basis_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(change: 'deleted') fid.basis_path = path fid.other_path = '' fid.signatures << basis_path_hash[path] subset_hash[:deleted].files << fid end self end
@api internal @param basis_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the basis group
@param other_path_hash [Hash<String,FileSignature>]
The file paths and associated signatures for manifestations appearing only in the other group
@return [FileGroupDifference]
Container for reporting the set of file-level differences of type 'modified'
# File lib/moab/file_group_difference.rb, line 285 def tabulate_modified_files(basis_path_hash, other_path_hash) matching_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(change: 'modified') fid.basis_path = path fid.other_path = 'same' fid.signatures << basis_path_hash[path] fid.signatures << other_path_hash[path] subset_hash[:modified].files << fid end self end
@api internal @param matching_signatures [Array<FileSignature>] The file signature of the file manifestations being compared @param basis_signature_hash [Hash<FileSignature, FileManifestation>]
Signature to file path mapping from the file group that is the basis of the comparison
@param other_signature_hash [Hash<FileSignature, FileManifestation>]
Signature to file path mapping from the file group that is the being compared to the basis group
@return [FileGroupDifference]
Container for reporting the set of file-level differences of type 'renamed','copyadded', or 'copydeleted'
# File lib/moab/file_group_difference.rb, line 252 def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths basis_only_paths = basis_paths - other_paths other_only_paths = other_paths - basis_paths maxsize = [basis_only_paths.size, other_only_paths.size].max (0..maxsize - 1).each do |n| fid = FileInstanceDifference.new fid.basis_path = basis_only_paths[n] fid.other_path = other_only_paths[n] fid.signatures << signature if fid.basis_path.nil? fid.change = 'copyadded' fid.basis_path = basis_paths[0] elsif fid.other_path.nil? fid.change = 'copydeleted' else fid.change = 'renamed' end subset_hash[fid.change.to_sym].files << fid end end self end
@api internal @param matching_signatures [Array<FileSignature>] The file signature of the file manifestations being compared @param basis_signature_hash [Hash<FileSignature, FileManifestation>]
Signature to file path mapping from the file group that is the basis of the comparison
@param other_signature_hash [Hash<FileSignature, FileManifestation>]
Signature to file path mapping from the file group that is the being compared to the basis group
@return [FileGroupDifference]
Container for reporting the set of file-level differences of type 'identical'
# File lib/moab/file_group_difference.rb, line 228 def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths matching_paths = basis_paths & other_paths matching_paths.each do |path| fid = FileInstanceDifference.new(change: 'identical') fid.basis_path = path fid.other_path = 'same' fid.signatures << signature subset_hash[:identical].files << fid end end self end