kmeansPredict

Purpose

Partitions data into k clusters, based upon k user supplied centroids.

Format

assignments = kmeansPredict(mdl, X)
assignments = kmeansPredict(centroids, X)
Parameters:
  • mdl (struct) – Instance of a kmeansModel structure.
  • centroids (kxP matrix) – Cluster centers.
  • X (NxP matrix) – The data to partition.
Returns:

assignments (Nx1 matrix) – The cluster to which each corresponding index of X has been assigned. range = 1-k.

Examples

Example 1: Basic example with a matrix of centroids.

library gml;

centroids = { 2  3,
             -2 -3 };

X = { 1  1,
      0 -2,
      2  0 };

// Assign each row of 'X' to either cluster 1 or cluster 2
assignments = kmeansPredict(centroids, X);

The above code will assign assignments equal to:

1
2
1

because, the points (1,1) and (2,0) are closer (euclidean distance) to the first centroid at point (2,3), while the second row of X (0,-2) is closer to the second centroid (-2,-3).

Example 2: Use centroids from a kmeansModel structure

new;
library gml;


// Get dataset with full name
fname = getGAUSSHome() $+ "pkgs/gml/examples/iris.csv";

// Load data
X = loadd(fname, ". -species");

// For repeatable sample
rndseed 234234;

// Split data into x_train and x_test
{ x_train, x_test } = splitData(X, 0.70);

// Number of clusters
n_clusters = 3;

// Declare kmeansModel struct
struct kmeansModel mdl;

// Fit kmeans model
mdl = kmeansFit(x_train , n_clusters);

// Assign test data to clusters
test_clusters = kmeansPredict(mdl, x_test);

print "The first three cluster assignments are: " test_clusters[1:3];

The above code will print the following:

The first three cluster assignments are:
  2
  2
  1

See also

kmeansFit(), kmeansControlCreate()