class Eco::API::Organization::PeopleSimilarity

Class to find out duplicates in the People Manager

@attr_writer attribute [String, Proc, nil] the target attribute to be read.

Attributes

attribute[RW]

Public Instance Methods

analyse(needle_read: nil, keep_empty: false, **options) click to toggle source

Analyses People bases on `options` @param needle_read [Proc, Symbol] when the value to read from `needle` object is different to the `:read` (`attribute`).

This allows to for example, facet `needle.name` (needle_read) against `haystack_item.details[alt_id]` (read).

@param keep_empty [Boolean] to indicate if it should get rid of people with no results (based on threshold) @return [Hash] where the keys are the people `id`s and the values the `Eco::Data::FuzzyMatch::Results`

# File lib/eco/api/organization/people_similarity.rb, line 124
def analyse(needle_read: nil, keep_empty: false, **options)
  options = { read: self.attribute }.merge(options)
  total = count; i = 1
  each_with_object({}) do |person, results|
    needle_str = needle_read ? item_string(person, needle_read) : nil
    results[person.id] = find_all_with_score(person, needle_str: needle_str, **options)
    print_progress("Analysed", total, i)
    i += 1
  end.yield_self do |analysed|
    analysed = clean_empty(analysed) unless keep_empty
    #puts "... #{analysed.count} results after cleaning empty"
    analysed
  end
end
attribute=(attr) click to toggle source

@!group Config @return [String, Proc, nil] the target attribute to be read.

# File lib/eco/api/organization/people_similarity.rb, line 15
def attribute=(attr)
  @attribute = attr
end
attribute_present() click to toggle source

It returns all the entries with `attribute` n0t empty @return [Eco::API::Organization::PeopleSimilarity]

# File lib/eco/api/organization/people_similarity.rb, line 107
def attribute_present
  reject do |person|
    item_value(person).to_s.strip.length < 2
  end.yield_self do |results|
    newFrom(results)
  end
end
blank_attribute() click to toggle source

It returns all the entries with `attribute` empty @return [Eco::API::Organization::PeopleSimilarity]

# File lib/eco/api/organization/people_similarity.rb, line 97
def blank_attribute
  select do |person|
    item_value(person).to_s.strip.length < 2
  end.yield_self do |results|
    newFrom(results)
  end
end
clean_empty(analysed) click to toggle source

Removes from results those that do not have similar entries

# File lib/eco/api/organization/people_similarity.rb, line 160
def clean_empty(analysed)
  analysed.select do |id, results|
    !results.empty?
  end
end
ignore_matching_words(analysed, **options) click to toggle source

Renalyses by ignoring matching words between the `needle` and those found in `results`

# File lib/eco/api/organization/people_similarity.rb, line 198
def ignore_matching_words(analysed, **options)
  prompt = "Reanalysing by ignoring matching words"
  reanalyse(analysed, msg: prompt, **options) do |needle_str, item_str, needle, item|
    self.class.remove_matching_words(needle_str, item_str)
  end
end
ignore_matching_words_old(analysed, **options) click to toggle source

Renalyses by ignoring matching words between the `needle` and those found in `results`

# File lib/eco/api/organization/people_similarity.rb, line 206
def ignore_matching_words_old(analysed, **options)
  options = { read: self.attribute }.merge(options)
  total = analysed.count; i = 1
  with_analysed(analysed) do |person, results|
    print_progress("Reanalysing by ignoring matching words", total, i)
    i += 1
    ignore_same_words_score(results, **options)
  end
end
item_value(person) click to toggle source

Returns the target value to analyse @param person [Ecoportal::API::V1::Person]

# File lib/eco/api/organization/people_similarity.rb, line 25
def item_value(person)
  return attr.call(item) if attribute.is_a?(Proc)
  attr = attribute.to_sym
  return item.send(attr) if item.respond_to?(attr)
end
named() click to toggle source

It returns all people with no name @return [Eco::API::Organization::PeopleSimilarity]

# File lib/eco/api/organization/people_similarity.rb, line 87
def named
  reject do |person|
    person.name.to_s.strip.length < 2
  end.yield_self do |results|
    newFrom(results)
  end
end
newFrom(data) click to toggle source

Generates a new object with same config but different base `data`. @return [Eco::API::Organization::PeopleSimilarity]

Calls superclass method Eco::API::Organization::People#newFrom
# File lib/eco/api/organization/people_similarity.rb, line 54
def newFrom(data)
  super(data).tap do |simil|
    simil.threshold = threshold
    simil.order     = order
    simil.attribute = attribute
  end
end
newSimilarity(analysed) click to toggle source

Gets a new instance object of this class, with only people in results @param analysed [Hash] where the keys are the people `id`s and values the `Eco::Data::FuzzyMatch::Results` @return [Eco::API::Organization::PeopleSimilarity]

# File lib/eco/api/organization/people_similarity.rb, line 146
def newSimilarity(analysed)
  newFrom(people_in_results(analysed))
end
order() click to toggle source
# File lib/eco/api/organization/people_similarity.rb, line 38
def order
  @order ||= [:words_ngrams, :dice]
end
order=(values) click to toggle source
people_in_results(analysed) click to toggle source
# File lib/eco/api/organization/people_similarity.rb, line 150
def people_in_results(analysed)
  analysed.each_with_object([]) do |(id, results), people|
    related = results.each_with_object([self[id]]) do |result, related|
      related << result.match
    end
    related.each {|person| people << person unless people.include?(person)}
  end
end
print_analysis(**options) click to toggle source

@note

1. Unless `:analysed` is provided, it launches an analysis cutting with Jaro Winker min 0.5
2. It then re-sorts and cuts based on `options`

@return [Hash] where the keys are the people `id`s and the values the `Eco::Data::FuzzyMatch::Results`

reanalyse(analysed, msg: "Reanalysing", **options, &block) click to toggle source

Reanalyses by using a block to treat the needle and item values

# File lib/eco/api/organization/people_similarity.rb, line 187
def reanalyse(analysed, msg: "Reanalysing", **options, &block)
  options = { read: self.attribute }.merge(options)
  total = analysed.count; i = 1
  with_analysed(analysed) do |person, results|
    print_progress(msg, total, i)
    i += 1
    recalculate_results(results, &block)
  end
end
rearrange(analysed, **options) click to toggle source

Launches a reanalyis on `analysed` based on `options` @param analysed [Hash] where the keys are the people `id`s and the values the `Eco::Data::FuzzyMatch::Results`

# File lib/eco/api/organization/people_similarity.rb, line 180
def rearrange(analysed, **options)
  with_analysed(analysed) do |person, results|
    results.relevant_results(**options)
  end
end
repeated_emails() click to toggle source

It gathers those that have the same `email` @return [Hash] where `keys` are `email`s and `values` an `Array<Person>`

# File lib/eco/api/organization/people_similarity.rb, line 68
def repeated_emails
  init_caches
  @by_email.select do |email, people|
    people.count > 1
  end
end
report(analysed, format: :txt) click to toggle source

@return [String] well structured text

# File lib/eco/api/organization/people_similarity.rb, line 221
def report(analysed, format: :txt)
  case
  when format == :txt
    analysed.each_with_object("") do |(id, results), out|
      msg = results.results.map {|r| r.print}.join("\n  ")
      out << "#{self[id].identify}:\n  " + msg + "\n"
    end
  end
end
threshold() click to toggle source
# File lib/eco/api/organization/people_similarity.rb, line 48
def threshold
  @threshold ||= 0.15
end
threshold=(value) click to toggle source

Define the order or relevant of per user matches @param value [Float] the threshold that all of the algorithms should comply with

# File lib/eco/api/organization/people_similarity.rb, line 44
def threshold=(value)
  @threshold = value
end
unnamed() click to toggle source

It returns all people with no name @return [Eco::API::Organization::PeopleSimilarity]

# File lib/eco/api/organization/people_similarity.rb, line 77
def unnamed
  select do |person|
    person.name.to_s.strip.length < 2
  end.yield_self do |results|
    newFrom(results)
  end
end
with_analysed(analysed, keep_empty: false) { |self, results| ... } click to toggle source

Helper to do some treatment fo the results @param analysed [Hash] where the keys are the people `id`s and values the `Eco::Data::FuzzyMatch::Results` @return [Hash] where the keys are the people `id`s and values the `Eco::Data::FuzzyMatch::Results`

# File lib/eco/api/organization/people_similarity.rb, line 169
def with_analysed(analysed, keep_empty: false)
  analysed.each_with_object({}) do |(id, results), reanalysed|
    reanalysed[id] = yield(self[id], results)
  end.yield_self do |reanalysed|
    reanalysed = clean_empty(reanalysed) unless keep_empty
    reanalysed
  end.tap {|out| "with_analysed... returns #{out.count} records"}
end

Protected Instance Methods

on_change() click to toggle source

@!endgroup

# File lib/eco/api/organization/people_similarity.rb, line 245
def on_change
  remove_instance_variable(@fuzzy_match)
  super
end

Private Instance Methods

print_progress(msg, total, num) click to toggle source