kmeansPredict

Purpose

Partitions data into k clusters, based upon k user supplied centroids.

Format

assignments = kmeansPredict(mdl, X)
assignments = kmeansPredict(centroids, X)
Parameters:
  • mdl (struct) – Instance of a kmeansModel structure.

  • centroids (kxP matrix) – Cluster centers.

  • X (NxP matrix) – The data to partition.

Returns:

assignments (Nx1 matrix) – The cluster to which each corresponding index of X has been assigned. range = 1-k.

Examples

Example 1: Basic example with a matrix of centroids.

library gml;

centroids = { 2  3,
             -2 -3 };

X = { 1  1,
      0 -2,
      2  0 };

// Assign each row of 'X' to either cluster 1 or cluster 2
assignments = kmeansPredict(centroids, X);

The above code will assign assignments equal to:

1
2
1

because, the points (1,1) and (2,0) are closer (euclidean distance) to the first centroid at point (2,3), while the second row of X (0,-2) is closer to the second centroid (-2,-3).

Example 2: Use centroids from a kmeansModel structure

new;
library gml;

// For repeatable sample
rndseed 234234;

// Get dataset with full name
fname = getGAUSSHome("pkgs/gml/examples/iris.csv");

// Load data
X = loadd(fname, ". -species");

// Split data into x_train and x_test
{ x_train, x_test } = splitData(X, 0.70);

// Number of clusters
n_clusters = 3;

// Declare kmeansModel struct
struct kmeansModel mdl;

// Fit kmeans model
mdl = kmeansFit(x_train , n_clusters);

// Assign test data to clusters
test_clusters = kmeansPredict(mdl, x_test);

The above code will print the following:

=================================================================
Model:                      K-Means         Number clusters:    3
Number observations:            105         Number features:    4
Init method:              K-means++           Number starts:    3
Tolerance:                   0.0001
=================================================================

K-means fit performance statistics:
============================================================
Total sum of squares:                                477.576
Between group sum of squares:                      419.05229
Within group sum of squares:                        58.52371
The ratio of BSS/TSS:                             0.87745676
============================================================
Centroids:
====================================================================
  SepalLength       SepalWidth      PetalLength       PetalWidth

      5.82381          2.70952             4.35          1.42143
      5.00937          3.40625            1.475             0.25
       6.8871          3.06129          5.73226          2.06129
====================================================================

K-Means Prediction Clusters Frequencies:
=============================================
Label      Count   Total %    Cum. %
    1         19     42.22     42.22
    2         18        40     82.22
    3          8     17.78       100
Total         45       100
=============================================