decForestRFit¶
Purpose¶
Fit a decision forest regression model.
Format¶
-
dfm =
decForestRFit
(y_train, x_train[, dfc])¶ - Parameters
y_train (Nx1 vector) – The dependent variable.
x_train (NxP matrix.) – The independent variables.
dfc (struct) –
Optional input, an instance of the
dfControl
structure. For an instance named, dfc the members are:dfc.numTrees
Scalar, number of trees (must be integer). Default = 100
dfc.obsPerTree
Scalar, observations per a tree. Default = 1.0.
dfc.featurePerNode
Scalar, number of features considered at a node. Default = nvars/3.
dfc.maxTreeDepth
Scalar, maximum tree depth. Default = unlimited.
dfc.minObsNode
Scalar, minimum observations per node. Default = 1.
dfc.impurityThreshold
Scalar, impurity threshold. Default = 0.
dfc.oobError
Scalar, 1 to compute OOB error, 0 otherwise. Default = 0.
dfc.variableImpurityMethod
Scalar, method of calculating variable importance.
0 = none,
1 = mean decrease in impurity
2 = mean decrease in accuracy (MDA),
3 = scaled MDA.
Default = 0.
- Returns
dfm (struct) –
An instance of the dfModel structure. An instance named dfm will have the following members:
dfm.variableImportance
Matrix, 1 x p, variable importance measure if computation of variable importance is specified, zero otherwise.
dfm.oobError
Scalar, out-of-bag error if OOB error computation is specified, zero otherwise.
dfm.numClasses
Scalar, number of classes if classification model, zero otherwise.
Examples¶
new;
library gml;
/*
** Load and transform data
*/
// Load hitters dataset
dataset = getGAUSSHome $+ "pkgs/gml/examples/hitters.xlsx";
// Load salary and perform natural log transform
y = loadd(dataset, "ln(salary)");
// Load all variables except 'salary'
X = loadd(dataset, ". - salary");
/*
** Split into test and training sets
*/
// Set seed for repeatable sampling
rndseed 234234;
// Split data into training and test sets
{ y_train, y_test, X_train, X_test } = trainTestSplit(y, X, 0.7);
/*
** Estimate decision forest model
*/
// Declare 'dfc' to be a dfControl structure
// and fill with default settings.
struct dfControl dfc;
dfc = dfControlCreate();
// Turn on variable importance
dfc.variableImportanceMethod = 1;
// Turn on OOB error
dfc.oobError = 1;
// Structure to hold model results
struct dfModel mdl;
// Fit training data using decision forest
mdl = decForestRFit(y_train, X_train, dfc);
// OOB Error
print "Out-of-bag error:" mdl.oobError;
The code above will print the following output:
random forest test MSE: 0.23044959
Remarks¶
The dfModel
structure contains a fourth, internally used member, opaqueModel
, which contains model details used by decForestPredict()
.
See also
Functions decForestPredict()
, decForestCFit()