decForestCFit¶
Purpose¶
Fit a decision forest classification model.
Format¶
-
dfm =
decForestCFit
(y_train, x_train[, dfc])¶ - Parameters
y_train (Nx1 vector) – The dependent variable.
x_train (NxP matrix.) – The independent variables.
dfc (struct) –
Optional input, an instance of the
dfControl
structure. For an instance named, dfc the members are:dfc.numTrees
Scalar, number of trees (must be integer). Default = 100
dfc.obsPerTree
Scalar, observations per a tree. Default = 1.0.
dfc.featurePerNode
Scalar, number of features considered at a node. Default = nvars/3.
dfc.maxTreeDepth
Scalar, maximum tree depth. Default = unlimited.
dfc.minObsNode
Scalar, minimum observations per node. Default = 1.
dfc.impurityThreshold
Scalar, impurity threshold. Default = 0.
dfc.oobError
Scalar, 1 to compute OOB error, 0 otherwise. Default = 0.
dfc.variableImpurityMethod
Scalar, method of calculating variable importance.
0 = none,
1 = mean decrease in impurity
2 = mean decrease in accuracy (MDA),
3 = scaled MDA.
Default = 0.
- Returns
dfm (struct) –
An instance of the dfModel structure. An instance named dfm will have the following members:
dfm.variableImportance
Matrix, 1 x p, variable importance measure if computation of variable importance is specified, zero otherwise.
dfm.oobError
Scalar, out-of-bag error if OOB error computation is specified, zero otherwise.
dfm.numClasses
Scalar, number of classes if classification model, zero otherwise.
Examples¶
new;
library gml;
rndseed 23423;
// Create file name with full path
fname = getGAUSSHome() $+ "pkgs/gml/examples/breastcancer.csv";
// Load all variables from dataset, except for 'ID'
X = loadd(fname, ". -ID");
// Separate dependent and independent variables
y = X[.,cols(X)];
X = delcols(X, cols(X));
// Split data into 70% training and 30% test set
{ X_train, X_test, y_train, y_test } = trainTestSplit(X, y, 0.7);
// Declare 'df_mdl' to be an 'dfModel' structure
// to hold the trained model
struct dfModel df_mdl;
// Train the decision forest classifier with default settings
df_mdl = decForestCFit(y_train, X_train);
// Make predictions on the test set, from our trained model
y_hat = decForestPredict(df_mdl, X_test);
// Print out model quality evaluation statistics
call binaryClassMetrics(y_test, y_hat);
The code above will print the following output:
Confusion matrix
----------------
Class + 54 2
Class - 1 153
Accuracy 0.9857
Precision 0.9643
Recall 0.9818
F-score 0.973
Specificity 0.9871
AUC 0.9845
Remarks¶
The dfModel
structure contains a fourth, internally used member, opaqueModel
, which contains model details used by decForestPredict()
.
See also
Functions decForestPredict()
, decForestRFit()