Muster
 All Classes Namespaces Files Functions Variables Typedefs Macros
par_kmedoids.cpp
Go to the documentation of this file.
1 //////////////////////////////////////////////////////////////////////////////////////////////////
2 // Copyright (c) 2010, Lawrence Livermore National Security, LLC.
3 // Produced at the Lawrence Livermore National Laboratory
4 // LLNL-CODE-433662
5 // All rights reserved.
6 //
7 // This file is part of Muster. For details, see http://github.com/tgamblin/muster.
8 // Please also read the LICENSE file for further information.
9 //
10 // Redistribution and use in source and binary forms, with or without modification, are
11 // permitted provided that the following conditions are met:
12 //
13 // * Redistributions of source code must retain the above copyright notice, this list of
14 // conditions and the disclaimer below.
15 // * Redistributions in binary form must reproduce the above copyright notice, this list of
16 // conditions and the disclaimer (as noted below) in the documentation and/or other materials
17 // provided with the distribution.
18 // * Neither the name of the LLNS/LLNL nor the names of its contributors may be used to endorse
19 // or promote products derived from this software without specific prior written permission.
20 //
21 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS
22 // OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
23 // MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
24 // LAWRENCE LIVERMORE NATIONAL SECURITY, LLC, THE U.S. DEPARTMENT OF ENERGY OR CONTRIBUTORS BE
25 // LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
26 // (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27 // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
28 // WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
29 // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 //////////////////////////////////////////////////////////////////////////////////////////////////
31 
32 ///
33 /// @file par_kmedoids.cpp
34 /// @author Todd Gamblin tgamblin@llnl.gov
35 ///
36 #include "par_kmedoids.h"
37 
38 #include <cstdlib>
39 #include <stdint.h>
40 #include <sys/time.h>
41 using namespace std;
42 
43 #include "random.h"
44 
45 namespace cluster {
46 
47  par_kmedoids::par_kmedoids(MPI_Comm comm)
48  : par_partition(comm),
49  seed_set(false),
50  total_dissimilarity(numeric_limits<double>::infinity()),
51  best_bic_score(0),
52  init_size(40),
53  max_reps(5),
54  epsilon(1e-15)
55  { }
56 
57  void par_kmedoids::set_seed(uint32_t s) {
58  random.seed(s);
59  seed_set = true;
60  }
61 
62  void par_kmedoids::set_epsilon(double e) {
63  epsilon = e;
64  }
65 
66 
68  int size;
69  CMPI_Comm_size(comm, &size);
70  return total_dissimilarity / size;
71  }
72 
74  return best_bic_score;
75  }
76 
77  void par_kmedoids::seed_random_uniform(MPI_Comm comm) {
78  int rank;
79  CMPI_Comm_rank(comm, &rank);
80 
81  // same seed on all processes.
82  uint32_t seed = get_time_seed();
83  CMPI_Bcast(&seed, 1, MPI_INT, 0, comm);
84  random.seed(seed);
85  seed_set = true;
86  }
87 
88 } // namespace cluster
CAPEK and XCAPEK scalable parallel clustering algorithms.
long get_time_seed()
Returns a seed for random number generators based on the product of sec and usec from gettimeofday()...
Definition: random.h:119
double average_dissimilarity()
Get the average dissimilarity of objects w/their medoids for the last run.
void seed_random_uniform(MPI_Comm comm)
Seeds random number generators across all processes with the same number, taken from the time in micr...
MPI_Comm comm
Communicator, the processes of which this partition divides.
Definition: par_partition.h:79
double total_dissimilarity(const partition &p, D dist)
Compute the total dissimilarity between all objects and their medoids.
Definition: partition.h:170
void set_seed(uint32_t seed)
Set the random seed.
#define CMPI_Comm_rank
Definition: mpi_bindings.h:81
par_partition represents a partitioning of a distributed data set.
Definition: par_partition.h:70
double best_bic_score
BIC score for the clustering found.
Definition: par_kmedoids.h:564
double bic_score()
BIC score for selected clustering.
random_t random
Random number distribution to be used for samples.
Definition: par_kmedoids.h:560
double epsilon
Tolerance for convergence tests in kmedoids PAM runs.
Definition: par_kmedoids.h:567
void set_epsilon(double epsilon)
Set tolerance for convergence.
#define CMPI_Bcast
Definition: mpi_bindings.h:80
Helper functions for taking random samples and seeding RNGs from the system clock.
double total_dissimilarity
Track whether the random seed has been set.
Definition: par_kmedoids.h:563
#define CMPI_Comm_size
Definition: mpi_bindings.h:82
Muster. Copyright © 2010, Lawrence Livermore National Laboratory, LLNL-CODE-433662.
Distribution of Muster and its documentation is subject to terms of the Muster LICENSE.
Generated on Thu Sep 1 2016 using Doxygen 1.8.5