cvSplit#

Purpose#

Returns the test and training set for the ith of k cross validation splits for a given set of dependent and independent variables.

Format#

{ y_train, y_test, X_train, X_test } = cvSplit(y, X, k, i)#

Parameters:

y (Nx1 vector, or NxK matrix.) – The dependent variable(s).
X (Nx1 vector, or NxK matrix) – The independent variable(s).
k (Scalar) – The number of folds.
i (Scalar) – The fold number.

Returns:

y_train – The training target values for the ith CV split.
y_test – The test target values for the ith CV split
X_train – The training predictor values for the ith CV split.
X_test – The test predictor values for the ith CV split.

Examples#

y = { 7, 2, 5, 1, 3, 4 };

X = { 1   3,
      9   6,
      6   1,
      8   4,
      9   5,
      1   8 };

// Divide the dataset into 3 folds. Place the first
// 1/3 of the observations in the test set and the remaining
// observations in the training set.
{ y_train, y_test, X_train, X_test } = cvSplit(y, X, 3, 1);

After the above code:

 y_train = 5   X_train = 6    1
           1             8    4
           3             9    5
           4             1    8

y_test  =  7  X_test  =  1    3
           2             9    6

Continuing with the same y and X from above, if we run:

// Divide the dataset into 3 folds. Place the second
// 1/3 of the observations in the test set and the remaining
// observations in the training set.
{ y_train, y_test, X_train, X_test } = cvSplit(y, X, 3, 2);

This time, the variables are assigned as follows:

 y_train = 7   X_train = 1    3
           2             9    6
           3             9    5
           4             1    8

y_test  =  5  X_test  =  6    1
           1             8    4

Remarks#

The observations from X and y are NOT randomly shuffled.