pcaFit#
Purpose#
Performs principal component dimension reduction.
Format#
- mdl = pcaFit(X, n_components)#
- Parameters:
X (NxP matrix) – Independent variables.
n_components (Scalar) – The number of principal component vectors to return. \(1 \le n\_components \le P\)
- Returns:
mdl (struct) –
An instance of a
pcaModel
structure. For an instance named mdl, the members will be:mdl.singular_values
n_components x 1 vector, containing the largest singular values of X.
mdl.components
P x n_components matrix, containing the principal component vectors which represent the directions of greatest variance.
mdl.explained_variance_ratio
n_components x 1 vector, the percentage of variance explained by each of the returned component vectors.
mdl.explained_variance
n_components x 1 vector, the variance explained by each of the returned component vectors.
mdl.mean
1 x P vector, the means for each column of the input matrix X.
mdl.n_components
Scalar, the number of component vectors returned.
mdl.n_samples
Scalar, the number of rows of the input matrix X.
Examples#
new;
library gml;
/*
** Load data and prepare
*/
// Get file name with full path
fname = getGAUSSHome("pkgs/gml/examples/winequality.csv");
// Load data
X = loadd(fname, ". -quality");
/*
** Train the model
*/
// Number of components
n_components = 3;
struct pcaModel mdl;
mdl = pcaFit(X, n_components);
The above code will print the following output, which shows us that the first principal component accounts for nearly 95% of the variance.
==================================================
Model: PCA
Number observations: 1599
Number variables: 11
Number components: 3
==================================================
Component Proportion Cumulative
Of Variance Proportion
PC1 0.947 0.947
PC2 0.048 0.995
PC3 0.003 0.998
==================================================
Principal components PC1 PC2 PC3
==================================================
fixed acidity 0.0061 -0.0239 -0.9531
volatile acidity -0.0004 -0.0020 0.0251
citric acid -0.0002 -0.0030 -0.0737
residual sugar -0.0086 0.0111 -0.2809
chlorides -0.0001 -0.0002 -0.0029
free sulfur dioxide -0.2189 0.9753 -0.0209
total sulfur dioxide -0.9757 -0.2189 0.0015
density -0.0000 -0.0000 -0.0008
pH 0.0003 0.0033 0.0586
sulphates -0.0002 0.0006 -0.0175
alcohol 0.0064 0.0146 0.0486
We can now transform the input data to the new 3-dimensional space with pcaTransform()
:
X_transform = pcaTransform(X, mdl);
After the above code, the first 5 rows of X_transform will be:
PC1 PC2 PC3
13.224905 -2.0238998 1.1268205
-22.037724 4.4083216 0.31037799
-7.1626733 -2.5014609 0.58186830
-13.430063 -1.9511222 -2.6340395
13.224905 -2.0238998 1.1268205
See also