Fit a decision forest classification model.


dfm = decForestCFit(y_train, x_train[, dfc])
  • y_train (Nx1 vector) – The dependent variable.
  • x_train (NxP matrix.) – The independent variables.
  • dfc (struct) –

    Optional input, an instance of the dfControl structure. For an instance named, dfc the members are:

    dfc.numTrees Scalar, number of trees (must be integer). Default = 100
    dfc.obsPerTree Scalar, observations per a tree. Default = 1.0.
    dfc.featurePerNode Scalar, number of features considered at a node. Default = nvars/3.
    dfc.maxTreeDepth Scalar, maximum tree depth. Default = unlimited.
    dfc.minObsNode Scalar, minimum observations per node. Default = 1.
    dfc.impurityThreshold Scalar, impurity threshold. Default = 0.
    dfc.oobError Scalar, 1 to compute OOB error, 0 otherwise. Default = 0.
    dfc.variableImpurityMethod Scalar, method of calculating variable importance.
    • 0 = none,
    • 1 = mean decrease in impurity
    • 2 = mean decrease in accuracy (MDA),
    • 3 = scaled MDA.

    Default = 0.


dfm (struct) –

An instance of the dfModel structure. An instance named dfm will have the following members:

dfm.variableImportance Matrix, 1 x p, variable importance measure if computation of variable importance is specified, zero otherwise.
dfm.oobError Scalar, out-of-bag error if OOB error computation is specified, zero otherwise.
dfm.numClasses Scalar, number of classes if classification model, zero otherwise.


library gml;

rndseed 23423;

// Create file name with full path
fname = getGAUSSHome() $+ "pkgs/gml/examples/breastcancer.csv";

// Load all variables from dataset, except for 'ID'
X = loadd(fname, ". -ID");

// Separate dependent and independent variables
y = X[.,cols(X)];
X = delcols(X, cols(X));

// Split data into 70% training and 30% test set
{ X_train, X_test, y_train, y_test } = trainTestSplit(X, y, 0.7);

// Declare 'df_mdl' to be an 'dfModel' structure
// to hold the trained model
struct dfModel df_mdl;

// Train the decision forest classifier with default settings
df_mdl = decForestCFit(y_train, X_train);

// Make predictions on the test set, from our trained model
y_hat = decForestPredict(df_mdl, X_test);

// Print out model quality evaluation statistics
call binaryClassMetrics(y_test, y_hat);

The code above will print the following output:

            Confusion matrix

    Class +       54       2
    Class -        1     153

   Accuracy           0.9857
  Precision           0.9643
     Recall           0.9818
    F-score            0.973
Specificity           0.9871
        AUC           0.9845


The dfModel structure contains a fourth, internally used member, opaqueModel, which contains model details used by decForestPredict().

See also

Functions decForestPredict(), decForestRFit()