vtkKMeansStatistics Class Reference

#include <vtkKMeansStatistics.h>

Inheritance diagram for vtkKMeansStatistics:

Inheritance graph
[legend]
Collaboration diagram for vtkKMeansStatistics:

Collaboration graph
[legend]

List of all members.


Detailed Description

A class for KMeans clustering.

This class takes as input an optional vtkTable on port LEARN_PARAMETERS specifying initial set(s) of cluster values of the following form:

           K     | Col1            |  ...    | ColN    
      -----------+-----------------+---------+---------------
           M     |clustCoord(1, 1) |  ...    | clustCoord(1, N)
           M     |clustCoord(2, 1) |  ...    | clustCoord(2, N)
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           M     |clustCoord(M, 1) |  ...    | clustCoord(M, N)
           L     |clustCoord(1, 1) |  ...    | clustCoord(1, N)
           L     |clustCoord(2, 1) |  ...    | clustCoord(2, N)
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           L     |clustCoord(L, 1) |  ...    | clustCoord(L, N)
 

Because the desired value of K is often not known in advance and the results of the algorithm are dependent on the initial cluster centers, we provide a mechanism for the user to test multiple runs or sets of cluster centers within a single call to the Learn phase. The first column of the table identifies the number of clusters K in the particular run (the entries in this column should be of type vtkIdType), while the remaining columns are a subset of the columns contained in the table on port INPUT_DATA. We require that all user specified clusters be of the same dimension N and consequently, that the LEARN_PARAMETERS table have N+1 columns. Due to this restriction, only one request can be processed for each call to the Learn phase and subsequent requests are silently ignored. Note that, if the first column of the LEARN_PARAMETERS table is not of type vtkIdType, then the table will be ignored and a single run will be performed using the first DefaultNumberOfClusters input data observations as initial cluster centers.

When the user does not supply an initial set of clusters, then the first DefaultNumberOfClusters input data observations are used as initial cluster centers and a single run is performed.

This class provides the following functionalities, depending on the mode it is executed in: Learn: calculates new cluster centers for each run. The output metadata on port OUTPUT_MODEL is a multiblock dataset containing at a minimum one vtkTable with columns specifying the following for each run: the run ID, number of clusters, number of iterations required for convergence, total error associated with the cluster (sum of squared Euclidean distance from each observation to its nearest cluster center), the cardinality of the cluster, and the new cluster coordinates.

Derive: An additional vtkTable is stored in the multiblock dataset output on port OUTPUT_MODEL. This table contains columns that store for each run: the runID, number of clusters, total error for all clusters in the run, local rank, and global rank. The local rank is computed by comparing squared Euclidean errors of all runs with the same number of clusters. The global rank is computed analagously across all runs.

Assess: This requires a multiblock dataset (as computed from Learn and Derive) on input port INPUT_MODEL and tabular data on input port INPUT_DATA that contains column names matching those of the tables on input port INPUT_MODEL. The assess mode reports the closest cluster center and associated squared Euclidean distance of each observation in port INPUT_DATA's table to the cluster centers for each run in the multiblock dataset provided on port INPUT_MODEL.

The code can handle a wide variety of data types as it operates on vtkAbstractArrays and is not limited to vtkDataArrays. A default distance functor that computes the sum of the squares of the Euclidean distance between two objects is provided (vtkKMeansDistanceFunctor). The default distance functor can be overridden to use alternative distance metrics.

Thanks:
Thanks to Janine Bennett, David Thompson, and Philippe Pebay of Sandia National Laboratories for implementing this class.
Examples:
vtkKMeansStatistics (Examples)
Tests:
vtkKMeansStatistics (Tests)

Definition at line 110 of file vtkKMeansStatistics.h.


Public Types

typedef vtkStatisticsAlgorithm Superclass

Public Member Functions

virtual const char * GetClassName ()
virtual int IsA (const char *type)
virtual void PrintSelf (ostream &os, vtkIndent indent)
virtual void SetDistanceFunctor (vtkKMeansDistanceFunctor *)
virtual vtkKMeansDistanceFunctorGetDistanceFunctor ()
virtual void SetDefaultNumberOfClusters (int)
virtual int GetDefaultNumberOfClusters ()
virtual void SetKValuesArrayName (const char *)
virtual char * GetKValuesArrayName ()
virtual void SetMaxNumIterations (int)
virtual int GetMaxNumIterations ()
virtual void SetTolerance (double)
virtual double GetTolerance ()
virtual void Aggregate (vtkDataObjectCollection *, vtkDataObject *)

Static Public Member Functions

static int IsTypeOf (const char *type)
static vtkKMeansStatisticsSafeDownCast (vtkObject *o)
static vtkKMeansStatisticsNew ()

Protected Member Functions

 vtkKMeansStatistics ()
 ~vtkKMeansStatistics ()
virtual void Derive (vtkDataObject *)
virtual vtkIdType GetTotalNumberOfObservations (vtkIdType numObservations)
virtual int FillOutputPortInformation (int port, vtkInformation *info)
virtual int FillInputPortInformation (int port, vtkInformation *info)
virtual void Learn (vtkTable *inData, vtkTable *inParameters, vtkDataObject *outMeta)
virtual void Assess (vtkTable *, vtkDataObject *, vtkTable *)
virtual void Test (vtkTable *, vtkDataObject *, vtkDataObject *)
virtual void SelectAssessFunctor (vtkTable *inData, vtkDataObject *inMeta, vtkStringArray *rowNames, AssessFunctor *&dfunc)
virtual void UpdateClusterCenters (vtkTable *newClusterElements, vtkTable *curClusterElements, vtkIdTypeArray *numMembershipChanges, vtkIdTypeArray *numElementsInCluster, vtkDoubleArray *error, vtkIdTypeArray *startRunID, vtkIdTypeArray *endRunID, vtkIntArray *computeRun)
int InitializeDataAndClusterCenters (vtkTable *inParameters, vtkTable *inData, vtkTable *dataElements, vtkIdTypeArray *numberOfClusters, vtkTable *curClusterElements, vtkTable *newClusterElements, vtkIdTypeArray *startRunID, vtkIdTypeArray *endRunID)
virtual void CreateInitialClusterCenters (vtkIdType numToAllocate, vtkIdTypeArray *numberOfClusters, vtkTable *inData, vtkTable *curClusterElements, vtkTable *newClusterElements)

Protected Attributes

int DefaultNumberOfClusters
char * KValuesArrayName
int MaxNumIterations
double Tolerance
vtkKMeansDistanceFunctorDistanceFunctor

Member Typedef Documentation

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Definition at line 113 of file vtkKMeansStatistics.h.


Constructor & Destructor Documentation

vtkKMeansStatistics::vtkKMeansStatistics (  )  [protected]

vtkKMeansStatistics::~vtkKMeansStatistics (  )  [protected]


Member Function Documentation

virtual const char* vtkKMeansStatistics::GetClassName (  )  [virtual]

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

static int vtkKMeansStatistics::IsTypeOf ( const char *  name  )  [static]

Return 1 if this class type is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual int vtkKMeansStatistics::IsA ( const char *  name  )  [virtual]

Return 1 if this class is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

static vtkKMeansStatistics* vtkKMeansStatistics::SafeDownCast ( vtkObject o  )  [static]

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual void vtkKMeansStatistics::PrintSelf ( ostream &  os,
vtkIndent  indent 
) [virtual]

Methods invoked by print to print information about the object including superclasses. Typically not called by the user (use Print() instead) but used in the hierarchical print process to combine the output of several classes.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

static vtkKMeansStatistics* vtkKMeansStatistics::New (  )  [static]

Create an object with Debug turned off, modified time initialized to zero, and reference counting on.

Reimplemented from vtkTableAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual void vtkKMeansStatistics::SetDistanceFunctor ( vtkKMeansDistanceFunctor  )  [virtual]

Set the DistanceFunctor.

virtual vtkKMeansDistanceFunctor* vtkKMeansStatistics::GetDistanceFunctor (  )  [virtual]

Set the DistanceFunctor.

virtual void vtkKMeansStatistics::SetDefaultNumberOfClusters ( int   )  [virtual]

Set/get the DefaultNumberOfClusters, used when no initial cluster coordinates are specified.

virtual int vtkKMeansStatistics::GetDefaultNumberOfClusters (  )  [virtual]

Set/get the DefaultNumberOfClusters, used when no initial cluster coordinates are specified.

virtual void vtkKMeansStatistics::SetKValuesArrayName ( const char *   )  [virtual]

Set/get the KValuesArrayName.

virtual char* vtkKMeansStatistics::GetKValuesArrayName (  )  [virtual]

Set/get the KValuesArrayName.

virtual void vtkKMeansStatistics::SetMaxNumIterations ( int   )  [virtual]

Set/get the MaxNumIterations used to terminate iterations on cluster center coordinates when the relative tolerance can not be met.

virtual int vtkKMeansStatistics::GetMaxNumIterations (  )  [virtual]

Set/get the MaxNumIterations used to terminate iterations on cluster center coordinates when the relative tolerance can not be met.

virtual void vtkKMeansStatistics::SetTolerance ( double   )  [virtual]

Set/get the relative Tolerance used to terminate iterations on cluster center coordinates.

virtual double vtkKMeansStatistics::GetTolerance (  )  [virtual]

Set/get the relative Tolerance used to terminate iterations on cluster center coordinates.

virtual void vtkKMeansStatistics::Aggregate ( vtkDataObjectCollection ,
vtkDataObject  
) [inline, virtual]

Given a collection of models, calculate aggregate model NB: not implemented

Implements vtkStatisticsAlgorithm.

Definition at line 153 of file vtkKMeansStatistics.h.

virtual int vtkKMeansStatistics::FillOutputPortInformation ( int  port,
vtkInformation info 
) [protected, virtual]

This algorithm returns a multiblock dataset containing several tables for its meta output (port OUTPUT_MODEL) instead of a single vtkTable. FillOutputPortInformation overridden accordingly.

Reimplemented from vtkStatisticsAlgorithm.

virtual int vtkKMeansStatistics::FillInputPortInformation ( int  port,
vtkInformation info 
) [protected, virtual]

This algorithm returns a multiblock dataset containing several tables for its meta output (port OUTPUT_MODEL) instead of a single vtkTable. FillOutputPortInformation overridden accordingly.

Reimplemented from vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Learn ( vtkTable inData,
vtkTable inParameters,
vtkDataObject outMeta 
) [protected, virtual]

Execute the calculations required by the Learn option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Derive ( vtkDataObject  )  [protected, virtual]

Execute the calculations required by the Derive option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Assess ( vtkTable ,
vtkDataObject ,
vtkTable  
) [protected, virtual]

Execute the calculations required by the Assess option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Test ( vtkTable ,
vtkDataObject ,
vtkDataObject  
) [inline, protected, virtual]

Execute the calculations required by the Test option.

Implements vtkStatisticsAlgorithm.

Definition at line 188 of file vtkKMeansStatistics.h.

virtual void vtkKMeansStatistics::SelectAssessFunctor ( vtkTable inData,
vtkDataObject inMeta,
vtkStringArray rowNames,
AssessFunctor *&  dfunc 
) [protected, virtual]

Provide the appropriate assessment functor.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::UpdateClusterCenters ( vtkTable newClusterElements,
vtkTable curClusterElements,
vtkIdTypeArray numMembershipChanges,
vtkIdTypeArray numElementsInCluster,
vtkDoubleArray error,
vtkIdTypeArray startRunID,
vtkIdTypeArray endRunID,
vtkIntArray computeRun 
) [protected, virtual]

Provide the appropriate assessment functor.

Reimplemented in vtkPKMeansStatistics.

virtual vtkIdType vtkKMeansStatistics::GetTotalNumberOfObservations ( vtkIdType  numObservations  )  [protected, virtual]

Subroutine to get the total number of observations. Called from within Learn (and will be overridden by vtkPKMeansStatistics to handle distributed datasets).

Reimplemented in vtkPKMeansStatistics.

int vtkKMeansStatistics::InitializeDataAndClusterCenters ( vtkTable inParameters,
vtkTable inData,
vtkTable dataElements,
vtkIdTypeArray numberOfClusters,
vtkTable curClusterElements,
vtkTable newClusterElements,
vtkIdTypeArray startRunID,
vtkIdTypeArray endRunID 
) [protected]

Subroutine to initalize the cluster centers using those provided by the user in input port LEARN_PARAMETERS. If no cluster centers are provided, the subroutine uses the first DefaultNumberOfClusters input data points as initial cluster centers. Called from within Learn.

virtual void vtkKMeansStatistics::CreateInitialClusterCenters ( vtkIdType  numToAllocate,
vtkIdTypeArray numberOfClusters,
vtkTable inData,
vtkTable curClusterElements,
vtkTable newClusterElements 
) [protected, virtual]

Subroutine to initialize cluster centerss if not provided by the user. Called from within Learn (and will be overridden by vtkPKMeansStatistics to handle distributed datasets).

Reimplemented in vtkPKMeansStatistics.


Member Data Documentation

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 250 of file vtkKMeansStatistics.h.

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 254 of file vtkKMeansStatistics.h.

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 257 of file vtkKMeansStatistics.h.

double vtkKMeansStatistics::Tolerance [protected]

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 260 of file vtkKMeansStatistics.h.

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 263 of file vtkKMeansStatistics.h.


The documentation for this class was generated from the following file:

Generated on Mon Sep 27 18:31:49 2010 for VTK by  doxygen 1.5.6