Performs principal component dimension reduction.


mdl = pcaFit(X, n_components)
  • X (NxP matrix) – Independent variables.
  • n_components (Scalar) – The number of principal component vectors to return. \(1 \le n\_components \le P\)

mdl (struct) –

An instance of a pcaModel structure. For an instance named mdl, the members will be:

mdl.singular_values n_components x 1 vector, containing the largest singular values of X.
mdl.components P x n_components matrix, containing the principal component vectors which represent the directions of greatest variance.
mdl.explained_variance_ratio n_components x 1 vector, the percentage of variance explained by each of the returned component vectors.
mdl.explained_variance n_components x 1 vector, the variance explained by each of the returned component vectors.
mdl.mean 1 x P vector, the means for each column of the input matrix X.
mdl.n_components Scalar, the number of component vectors returned.
mdl.n_samples Scalar, the number of rows of the input matrix X.


library gml;

// Get file name with full path
fname = getGAUSSHome() $+ "pkgs/gml/examples/winequality.csv";

// Load data
X = loadd(fname, ". -quality");

n_components = 3;

struct pcaModel mdl;
mdl = pcaFit(X, n_components);

print mdl.explained_variance_ratio;

The above code will print the following output, which shows us that the first principal component accounts for nearly 95% of the variance.


We can now transform the input data to the new 3-dimensional space with pcaTransform():

X_transform = pcaTransform(X, mdl);

After the above code, the first 5 rows of X_transform will be:

 13.2249  -2.0239   1.1268
-22.0377   4.4083   0.3104
 -7.1627  -2.5015   0.5819
-13.4301  -1.9511  -2.6340
 13.2249  -2.0239   1.1268

See also