# pcaFit¶

## Purpose¶

Performs principal component dimension reduction.

## Format¶

mdl = pcaFit(X, n_components)
Parameters:
• X (NxP matrix) – Independent variables.

• n_components (Scalar) – The number of principal component vectors to return. $$1 \le n\_components \le P$$

Returns:

mdl (struct) –

An instance of a pcaModel structure. For an instance named mdl, the members will be:

 mdl.singular_values n_components x 1 vector, containing the largest singular values of X. mdl.components P x n_components matrix, containing the principal component vectors which represent the directions of greatest variance. mdl.explained_variance_ratio n_components x 1 vector, the percentage of variance explained by each of the returned component vectors. mdl.explained_variance n_components x 1 vector, the variance explained by each of the returned component vectors. mdl.mean 1 x P vector, the means for each column of the input matrix X. mdl.n_components Scalar, the number of component vectors returned. mdl.n_samples Scalar, the number of rows of the input matrix X.

## Examples¶

new;
library gml;

/*
*/
// Get file name with full path
fname = getGAUSSHome("pkgs/gml/examples/winequality.csv");

/*
** Train the model
*/
// Number of components
n_components = 3;

struct pcaModel mdl;
mdl = pcaFit(X, n_components);


The above code will print the following output, which shows us that the first principal component accounts for nearly 95% of the variance.

==================================================
Model:                                         PCA
Number observations:                          1599
Number variables:                               11
Number components:                               3
==================================================

Component                Proportion     Cumulative
Of Variance     Proportion
PC1                           0.947          0.947
PC2                           0.048          0.995
PC3                           0.003          0.998

==================================================
Principal components       PC1       PC2       PC3
==================================================
fixed acidity           0.0061   -0.0239   -0.9531
volatile acidity       -0.0004   -0.0020    0.0251
citric acid            -0.0002   -0.0030   -0.0737
residual sugar         -0.0086    0.0111   -0.2809
chlorides              -0.0001   -0.0002   -0.0029
free sulfur dioxide    -0.2189    0.9753   -0.0209
total sulfur dioxide   -0.9757   -0.2189    0.0015
density                -0.0000   -0.0000   -0.0008
pH                      0.0003    0.0033    0.0586
sulphates              -0.0002    0.0006   -0.0175
alcohol                 0.0064    0.0146    0.0486


We can now transform the input data to the new 3-dimensional space with pcaTransform():

X_transform = pcaTransform(X, mdl);


After the above code, the first 5 rows of X_transform will be:

       PC1              PC2              PC3
13.224905       -2.0238998        1.1268205
-22.037724        4.4083216       0.31037799
-7.1626733       -2.5014609       0.58186830
-13.430063       -1.9511222       -2.6340395
13.224905       -2.0238998        1.1268205