Muster
 All Classes Namespaces Files Functions Variables Typedefs Macros
partition.h
Go to the documentation of this file.
1 //////////////////////////////////////////////////////////////////////////////////////////////////
2 // Copyright (c) 2010, Lawrence Livermore National Security, LLC.
3 // Produced at the Lawrence Livermore National Laboratory
4 // LLNL-CODE-433662
5 // All rights reserved.
6 //
7 // This file is part of Muster. For details, see http://github.com/tgamblin/muster.
8 // Please also read the LICENSE file for further information.
9 //
10 // Redistribution and use in source and binary forms, with or without modification, are
11 // permitted provided that the following conditions are met:
12 //
13 // * Redistributions of source code must retain the above copyright notice, this list of
14 // conditions and the disclaimer below.
15 // * Redistributions in binary form must reproduce the above copyright notice, this list of
16 // conditions and the disclaimer (as noted below) in the documentation and/or other materials
17 // provided with the distribution.
18 // * Neither the name of the LLNS/LLNL nor the names of its contributors may be used to endorse
19 // or promote products derived from this software without specific prior written permission.
20 //
21 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS
22 // OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
23 // MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
24 // LAWRENCE LIVERMORE NATIONAL SECURITY, LLC, THE U.S. DEPARTMENT OF ENERGY OR CONTRIBUTORS BE
25 // LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
26 // (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27 // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
28 // WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
29 // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 //////////////////////////////////////////////////////////////////////////////////////////////////
31 
32 ///
33 /// @file partition.h
34 /// @author Todd Gamblin tgamblin@llnl.gov
35 /// @brief Class to represent a partitioning of a data set.
36 ///
37 #ifndef PARTITION_H
38 #define PARTITION_H
39 
40 #include <cstddef>
41 #include <vector>
42 #include <set>
43 #include <ostream>
44 #include <stdint.h>
45 
46 ///
47 /// Namespace for everything in the cluster library.
48 ///
49 namespace cluster {
50 
51  typedef size_t medoid_id; ///< More descriptive type for medoid index
52  typedef size_t object_id; ///< More descriptive type for object index
53 
54  ///
55  /// Explicit representation of a clustering. Instead of a vecto of representative
56  /// ids, this has <i>k</i> sets of object_ids indicating which objects are in a
57  /// particular cluster. You can convert a partition to a cluster_list with
58  /// to_cluster_list().
59  ///
60  typedef std::vector< std::set<object_id> > cluster_list;
61 
62  ///
63  /// This represents a partitioning of a data set. The data set consists of
64  /// <i>objects</i>, each of which has an associated non-negative object_id.
65  ///
66  /// A partitioning divides a data set into groups, or <i>clusters</i>, and the
67  /// partition object stores information about these clusters internally. In
68  /// particular, it contains a vector of object_ids that indicate the
69  /// <i>representative</i> object, or <i>medoid</i> of each cluster. It also
70  /// contains a vector of medoid_ids indicating which cluster each object
71  /// in the data set belongs to.
72  ///
73  /// Partition objects can be converted to a cluster_list, which is a more
74  /// explicit representation of the partitioning.
75  ///
76  struct partition {
77  /// Gives the index of the object that is the ith medoid.
78  /// medoids[i] == index in object array for last call to findClusters()
79  std::vector<object_id> medoid_ids;
80 
81  /// Gives cluster id (index in medoids) for the ith object.
82  /// clusterid[i] == id of cluster of which object i is a member.
83  /// medoids[clusterid[i]] == representative of that cluster.
84  std::vector<medoid_id> cluster_ids;
85 
86  /// Constructor. Can optionall supply the number of objects to be partitioned
87  /// and this will start out with one cluster containing all of them.
88  partition(size_t num_objects = 0);
89 
90  /// Virtual destructor; currently does nothing.
91  virtual ~partition();
92 
93  /// True if and only if object i is a medoid.
94  bool is_medoid(object_id oi) const {
95  return medoid_ids[cluster_ids[oi]] == oi;
96  }
97 
98  /// Creates a list of std::sets from the partition info in
99  /// medoids and cluster_ids.
100  void to_cluster_list(cluster_list& list) const;
101 
102  /// Fast swap with other patrition objects
103  void swap(partition& other);
104 
105  /// puts medoids in order by their object id, and adjusts cluster_ids accordingly.
106  void sort();
107 
108  /// Total number of objects in the partition
109  size_t size() const { return cluster_ids.size(); }
110 
111  /// Total number of clusters in the partition
112  size_t num_clusters() const { return medoid_ids.size(); }
113 
114  /// Number of objects in cluster i
115  size_t size(size_t i) const;
116 
117  /// Write the members of cluster m out to the output stream as object_ids
118  template <class OutputIterator>
119  void write_members(medoid_id m, OutputIterator out) {
120  for (object_id o=0; o < cluster_ids.size(); o++) {
121  if (cluster_ids[o] == m) {
122  *out++ = o;
123  }
124  }
125  }
126 
127  /// Write the members of cluster m out to the output stream formatted nicely with
128  /// hyphenated runs of consecutive numbers
129  void write_members_with_runs(medoid_id m, std::ostream& out);
130 
131  /// writable structure returned by members() function.
132  struct member_writer {
135  member_writer(partition *_p, medoid_id _m) : p(_p), m(_m) { }
136  };
138 
139  }; // struct partition
140 
141  inline std::ostream& operator<<(std::ostream& out, const partition::member_writer& mw) {
142  mw.p->write_members_with_runs(mw.m, out);
143  return out;
144  }
145 
146  /// Prints out nicely formatted clustering
147  std::ostream& operator<<(std::ostream& out, const cluster_list& list);
148 
149  /// For convenience
150  std::ostream& operator<<(std::ostream& out, const partition& km);
151 
152  /// Mirkin distance bt/w two clusterings.
153  double mirkin_distance(const cluster_list& c1, const cluster_list& c2);
154 
155  /// Convenience overload for comparing partition objects directly
156  double mirkin_distance(const partition& c1, const partition& c2);
157 
158  ///
159  /// Expand a cluster_list by l levels. That is, replace each index i
160  /// in the cluster_list with indices in [2^l * i ... 2^l * (i+1) - 1]
161  ///
162  /// @todo deprecate and delete this.
163  ///
164  void expand(cluster_list& list, size_t level = 1);
165 
166  ///
167  /// Compute the total dissimilarity between all objects and their medoids.
168  ///
169  template <typename D>
170  double total_dissimilarity(const partition& p, D dist) {
171  double dissim = 0.0;
172  for (size_t i=0; i < p.cluster_ids.size(); i++) {
173  dissim += dist(i, p.medoid_ids[p.cluster_ids[i]]);
174  }
175  return dissim;
176  }
177 
178  ///
179  /// Compute the total dissimilarity between all objects in a particular cluster and its medoid.
180  ///
181  template <typename D>
182  double total_dissimilarity(const partition& p, D dist, medoid_id m) {
183  double dissim = 0.0;
184  for (size_t i=0; i < p.cluster_ids.size(); i++) {
185  if (p.cluster_ids[i] == m) {
186  dissim += dist(i, p.medoid_ids[p.cluster_ids[i]]);
187  }
188  }
189  return dissim;
190  }
191 
192  ///
193  /// Compute the total squared dissimilarity between all objects and their medoids.
194  ///
195  template <typename D>
196  double total_squared_dissimilarity(const partition& p, D dist) {
197  double dissim2 = 0.0;
198  for (size_t i=0; i < p.cluster_ids.size(); i++) {
199  double d = dist(i, p.medoid_ids[p.cluster_ids[i]]);
200  dissim2 += d * d;
201  }
202  return dissim2;
203  }
204 
205  ///
206  /// Compute the total squared dissimilarity between all objects in a particular
207  /// cluster and its medoid.
208  ///
209  template <typename D>
210  double total_squared_dissimilarity(const partition& p, D dist, medoid_id m) {
211  double dissim2 = 0.0;
212  for (size_t i=0; i < p.cluster_ids.size(); i++) {
213  if (p.cluster_ids[i] == m) {
214  double d = dist(i, p.medoid_ids[p.cluster_ids[i]]);
215  dissim2 += d * d;
216  }
217  }
218  return dissim2;
219  }
220 
221 
222 } // namespace cluster
223 
224 #endif // PARTITION_H
double total_squared_dissimilarity(const partition &p, D dist)
Compute the total squared dissimilarity between all objects and their medoids.
Definition: partition.h:196
void write_members(medoid_id m, OutputIterator out)
Write the members of cluster m out to the output stream as object_ids.
Definition: partition.h:119
std::vector< object_id > medoid_ids
Gives the index of the object that is the ith medoid.
Definition: partition.h:79
std::vector< std::set< object_id > > cluster_list
Explicit representation of a clustering.
Definition: partition.h:60
void swap(partition &other)
Fast swap with other patrition objects.
Definition: partition.cpp:68
member_writer members(medoid_id m)
Definition: partition.h:137
double total_dissimilarity(const partition &p, D dist)
Compute the total dissimilarity between all objects and their medoids.
Definition: partition.h:170
size_t size() const
Total number of objects in the partition.
Definition: partition.h:109
void sort()
puts medoids in order by their object id, and adjusts cluster_ids accordingly.
Definition: partition.cpp:74
std::ostream & operator<<(std::ostream &out, const id_pair< T > &p)
Print out an id_pair as a tuple of its element and its source rank.
Definition: id_pair.h:103
virtual ~partition()
Virtual destructor; currently does nothing.
Definition: partition.cpp:56
partition(size_t num_objects=0)
Constructor.
Definition: partition.cpp:51
size_t object_id
More descriptive type for object index.
Definition: partition.h:52
bool is_medoid(object_id oi) const
True if and only if object i is a medoid.
Definition: partition.h:94
size_t medoid_id
More descriptive type for medoid index.
Definition: partition.h:51
void expand(cluster_list &list, size_t level)
Expand a cluster_list by l levels.
Definition: partition.cpp:170
member_writer(partition *_p, medoid_id _m)
Definition: partition.h:135
void write_members_with_runs(medoid_id m, std::ostream &out)
Write the members of cluster m out to the output stream formatted nicely with hyphenated runs of cons...
Definition: partition.cpp:187
void to_cluster_list(cluster_list &list) const
Creates a list of std::sets from the partition info in medoids and cluster_ids.
Definition: partition.cpp:59
std::vector< medoid_id > cluster_ids
Gives cluster id (index in medoids) for the ith object.
Definition: partition.h:84
double mirkin_distance(const cluster_list &c1, const cluster_list &c2)
Mirkin distance bt/w two clusterings.
Definition: partition.cpp:125
This represents a partitioning of a data set.
Definition: partition.h:76
writable structure returned by members() function.
Definition: partition.h:132
size_t num_clusters() const
Total number of clusters in the partition.
Definition: partition.h:112
Muster. Copyright © 2010, Lawrence Livermore National Laboratory, LLNL-CODE-433662.
Distribution of Muster and its documentation is subject to terms of the Muster LICENSE.
Generated on Thu Sep 1 2016 using Doxygen 1.8.5