pcaFit#

Purpose#

Performs principal component dimension reduction.

Format#

mdl = pcaFit(X, n_components)#

Parameters:

X (NxP matrix) – Independent variables.
n_components (Scalar) – The number of principal component vectors to return. \(1 \le n\_components \le P\)

Returns:

mdl (struct) –

An instance of a pcaModel structure. For an instance named mdl, the members will be:

mdl.singular_values	n_components x 1 vector, containing the largest singular values of X.
mdl.components	P x n_components matrix, containing the principal component vectors which represent the directions of greatest variance.
mdl.explained_variance_ratio	n_components x 1 vector, the percentage of variance explained by each of the returned component vectors.
mdl.explained_variance	n_components x 1 vector, the variance explained by each of the returned component vectors.
mdl.mean	1 x P vector, the means for each column of the input matrix X.
mdl.n_components	Scalar, the number of component vectors returned.
mdl.n_samples	Scalar, the number of rows of the input matrix X.

Examples#

new;
library gml;

/*
** Load data and prepare
*/
// Get file name with full path
fname = getGAUSSHome("pkgs/gml/examples/winequality.csv");

// Load data
X = loadd(fname, ". -quality");

/*
** Train the model
*/
// Number of components
n_components = 3;

struct pcaModel mdl;
mdl = pcaFit(X, n_components);

The above code will print the following output, which shows us that the first principal component accounts for nearly 95% of the variance.

==================================================
Model:                                         PCA
Number observations:                          1599
Number variables:                               11
Number components:                               3
==================================================

Component                Proportion     Cumulative
                        Of Variance     Proportion
PC1                           0.947          0.947
PC2                           0.048          0.995
PC3                           0.003          0.998

==================================================
Principal components       PC1       PC2       PC3
==================================================
fixed acidity           0.0061   -0.0239   -0.9531
volatile acidity       -0.0004   -0.0020    0.0251
citric acid            -0.0002   -0.0030   -0.0737
residual sugar         -0.0086    0.0111   -0.2809
chlorides              -0.0001   -0.0002   -0.0029
free sulfur dioxide    -0.2189    0.9753   -0.0209
total sulfur dioxide   -0.9757   -0.2189    0.0015
density                -0.0000   -0.0000   -0.0008
pH                      0.0003    0.0033    0.0586
sulphates              -0.0002    0.0006   -0.0175
alcohol                 0.0064    0.0146    0.0486

We can now transform the input data to the new 3-dimensional space with pcaTransform():

X_transform = pcaTransform(X, mdl);

After the above code, the first 5 rows of X_transform will be:

       PC1              PC2              PC3
 13.224905       -2.0238998        1.1268205
-22.037724        4.4083216       0.31037799
-7.1626733       -2.5014609       0.58186830
-13.430063       -1.9511222       -2.6340395
 13.224905       -2.0238998        1.1268205