decForestRFit#

Purpose#

Fit a decision forest regression model.

Format#

dfm = decForestRFit(y_train, x_train[, dfc])#
Parameters:
  • y_train (Nx1 vector) – The target, or dependent variable.

  • X_train (NxP matrix) – The model features, or independent variables.

  • dfc (struct) –

    Optional input, an instance of the dfControl structure. For an instance named, dfc the members are:

    dfc.numTrees

    Scalar, number of trees (must be integer). Default = 100.

    dfc.pctObsPerTree

    Scalar, the percentage of observations selected for each tree (sampling with replacement). Valid range: 0.0 < pctObsPerTree <= 1.0. Default = 1.0.

    dfc.featuresPerSplit

    Scalar, number of features considered as possible splits at each node. Default = nvars/3.

    dfc.maxTreeDepth

    Scalar integer value, maximum tree depth. Default = 0 = unlimited.

    dfc.maxLeafNodes

    Scalar integer value, maximum number of leaves in each tree. Setting this to a positive integer value will cause the tree to be built by making the best possible splits first, instead of growing the trees in a depth first fashion. Default = 0 = unlimited.

    dfc.minObsLeaf

    Scalar integer value, minimum observations per leaf node. Default = 5.

    dfc.impurityThreshold

    Scalar, if the impurity value at a particular node is below this value, it will no longer be split. Default = 0.0.

    dfc.oobError

    Scalar, 1 to compute OOB error, 0 otherwise. Default = 0.

    dfc.variableImportanceMethod

    Scalar, method of calculating variable importance.

    • 0 = none,

    • 1 = mean decrease in impurity (Gini importance),

    • 2 = mean decrease in accuracy (MDA),

    • 3 = scaled MDA.

    Default = 0.

Returns:

dfm (struct) –

An instance of the dfModel structure. An instance named dfm will have the following members:

dfm.variableImportance

Matrix, 1 x p, variable importance measure if computation of variable importance is specified, zero otherwise.

dfm.oobError

Scalar, out-of-bag error if OOB error computation is specified, zero otherwise.

dfm.numClasses

Scalar, number of classes if classification model, zero otherwise.

Examples#

new;
library gml;

// Set seed for repeatable sampling
rndseed 23423;

/*
** Load data and prepare data
*/
// Load hitters dataset
dataset = getGAUSSHome("pkgs/gml/examples/hitters.xlsx");

// Load data from dataset and split
// into (70%) training and (30%) test sets
{ y_train, y_test, X_train, X_test } = trainTestSplit(dataset, "ln(salary)~ AtBat + Hits + HmRun + Runs + RBI + Walks + Years + PutOuts + Assists + Errors", 0.7);

/*
** Estimate decision forest model
*/
// Declare 'dfc' to be a dfControl structure
// and fill with default settings.
struct dfControl dfc;
dfc = dfControlCreate();

// Turn on variable importance
dfc.variableImportanceMethod = 1;

// Turn on OOB error
dfc.oobError = 1;

// Structure to hold model results
struct dfModel mdl;

// Fit training data using decision forest
mdl = decForestRFit(y_train, X_train, dfc);

The code above will print the following output:

======================================================================
Model:              Decision Forest         Target variable:ln_salary_
Number Observations:            184         Number features:        10
Number of trees:                100           Obs. per Tree:      100%
Min. Obs. Per Node:               1     Impurity Threshhold:         0
Out-of-bag error:            0.3157
======================================================================

=========================
Variable Importance Table
=========================
Years              0.6623
Walks              0.2358
Hits               0.1945
RBI                0.1895
AtBat              0.1867
Runs               0.1714
HmRun              0.1574
PutOuts            0.1543
Assists            0.1444
Errors             0.1437

Remarks#

The dfModel structure contains a fourth, internally used member, opaqueModel, which contains model details used by decForestPredict().

See also

Functions decForestPredict(), decForestCFit()