pcaFit¶
Purpose¶
Performs principal component dimension reduction.
Format¶

mdl =
pcaFit
(X, n_components)¶ Parameters:  X (NxP matrix) – Independent variables.
 n_components (Scalar) – The number of principal component vectors to return. \(1 \le n\_components \le P\)
Returns: mdl (struct) –
An instance of a
pcaModel
structure. For an instance named mdl, the members will be:mdl.singular_values n_components x 1 vector, containing the largest singular values of X. mdl.components P x n_components matrix, containing the principal component vectors which represent the directions of greatest variance. mdl.explained_variance_ratio n_components x 1 vector, the percentage of variance explained by each of the returned component vectors. mdl.explained_variance n_components x 1 vector, the variance explained by each of the returned component vectors. mdl.mean 1 x P vector, the means for each column of the input matrix X. mdl.n_components Scalar, the number of component vectors returned. mdl.n_samples Scalar, the number of rows of the input matrix X.
Examples¶
new;
library gml;
// Get file name with full path
fname = getGAUSSHome() $+ "pkgs/gml/examples/winequality.csv";
// Load data
X = loadd(fname, ". quality");
n_components = 3;
struct pcaModel mdl;
mdl = pcaFit(X, n_components);
print mdl.explained_variance_ratio;
The above code will print the following output, which shows us that the first principal component accounts for nearly 95% of the variance.
0.9466
0.0484
0.0026
We can now transform the input data to the new 3dimensional space with pcaTransform()
:
X_transform = pcaTransform(X, mdl);
After the above code, the first 5 rows of X_transform will be:
13.2249 2.0239 1.1268
22.0377 4.4083 0.3104
7.1627 2.5015 0.5819
13.4301 1.9511 2.6340
13.2249 2.0239 1.1268
See also