splitData¶
Purpose¶
Returns test and training splits for a single matrix of variables.
Format¶
-
{ X_train, X_test } =
splitData
(X, train_pct)¶ - Parameters
X (Nx1 vector, or NxP matrix.) – The matrix to split.
train_pct (Scalar) – The percentage of observations to include in the training set.
- Returns
X_train – (train_pct * N) x P matrix of independent variables.
X_test – The remaining observations from the original X which were not selected to be in the training set.
Examples¶
library gml;
// Set seed for repeatable sampling
rndseed 23324;
X = { 1 3,
9 6,
6 1,
8 4,
9 5,
1 8 };
// Shuffle data and create training set with 2/3 of
// the observations and 1/3 for the test set
{ X_train, X_test } = splitData(X, 0.67);
After the above code:
X_train = 9 5
1 3
8 4
1 8
X_test = 9 6
6 1
Remarks¶
The observations (rows) of X are kept together. For repeatable shuffling, use the rndseed keyword before calling splitData()
.
See also
Functions cvSplit()
, rndi()
, sampleData()
, trainTestSplit()