logisticRegFit

Purpose

Fit a logistic regression model with an optional L1 and/or L2 penalty.

Format

mdl = logisticRegFit(y, X[, ctl])
Parameters:
  • y (Nx1 vector) – The target, or dependent variable.

  • X (NxP matrix) – The model features, or independent variables.

  • lambda (Scalar, or Kx1 vector) – Optional L1 penalties. The model will be estimated for each lambda value. If not provided and ctl.lambdas is an empty matrix, {}, lassoFit() will create a vector of decreasing values. Default = {}.

  • ctl (struct) –

    Optional input, an instance of a logisticRegControl structure. An instance named ctl will have the following members:

    ctl.l1

    Scalar, the L1 regulariazation penalty. Default = 0.

    ctl.l2

    Scalar, the L2 regulariazation penalty. Default = 0.

    ctl.intercept

    Scalar, indicator to include constant. Set to 1 to include constant, 0 otherwise. Default = 1.

    ctl.tolerance

    Scalar, the tolerance for convergence of the coordinant descent optimization for each lambda value. Default = 1e-4.

    ctl.batch_size

    Scalar, size of batch to include in model. Default = 0 (all observations).

    ctl.max_iters

    The maximum number of iterations for the coordinate descent optimization for each provided lambda. Default = 1000.

    ctl.solver_type

    Solver to use for otpimization. Default = lbfgs.

Returns:

mdl (struct) –

An instance of a logisticRegModel structure. An instance named mdl will have the following members:

mdl.alpha_hat

(1 x nlambdas vector) The estimated value for the intercept for each provided lambda.

mdl.beta_hat

(P x nlambdas matrix) The estimated parameter values for each provided lambda.

mdl.l1

Scalar, the L1 regulariazation penalty.

mdl.l2

Scalar, the L2 regulariazation penalty.

mdl.nobs

Scalar, the number of observations.

mdl.nvars

Scalar, the number of variables.

Examples

Example 1: Basic Logistic Regression

new;
library gml;
rndseed 23423;

/*
** Load data and prepare data
*/

// Load wine quality dataset
dataset = loadd(getGAUSSHome("pkgs/gml/examples/winequality.csv"));

// Separate target variable from predictive features
X = delcols(dataset, "quality");
y = dataset[.,"quality"];

// Split data into (80%) training and (20%) test sets
{ y_train, y_test, X_train, X_test } = trainTestSplit(y, X, 0.8);

// Scale training features
{ X_train, mu, sd } = rescale(X_train, "standardize");

/*
** Train model
*/
// The logisticRegModel structure holds the trained model
struct logisticRegModel lrm;

// Fit training data, using default options
lrm = logisticRegFit(y_train, X_train);

Continuing with our example, we can make test predictions like this:

/*
** Test model
*/

// Apply training scale parameters to test data
X_test = rescale(X_test, mu, sd);

// Make predictions using test data
predictions = lmPredict(lrm, X_test);

call classificationMetrics(y_test, predictions);

This prints the following evaluation metrics:

===================================================
                             Classification metrics
===================================================
       Class   Precision  Recall  F1-score  Support

           3        0.00    0.00      0.00        2
           4        0.00    0.00      0.00       12
           5        0.67    0.77      0.72      137
           6        0.56    0.63      0.60      131
           7        0.47    0.22      0.30       36
           8        0.00    0.00      0.00        2

   Macro avg        0.28    0.27      0.27      320
Weighted avg        0.57    0.61      0.59      320

    Accuracy                          0.61      320

Example 2: Basic Logistic Regression with Regulariazation

new;
library gml;

/*
** Load data and prepare data
*/
// Load all variables from dataset, except for 'ID'
fname = getGAUSSHome("pkgs/gml/examples/breastcancer.csv");
data = loadd(fname, ". -ID");

// Remove any rows with missing values
data = packr(data);

// Extract target variable and set class names
// for more informative reporting
y = data[., "class"];
y = setcollabels(y, "Positive"$|"Negative", 1|0);

// Remove target variable to create feature dataframe
X = delcols(data, "class");

// Split data into 70% training and 30% test set
{ y_train, y_test, X_train, X_test } = trainTestSplit(y, X, 0.7);

/*
** Train model
*/
// Declare 'lr_mdl' to be an 'logisticRegModel' structure
// to hold the trained model
struct logisticRegModel lr_mdl;

// Declare 'lrc' to be a logisticRegControl
// structure and fill with default settings
struct logisticRegControl lrc;
lrc = logisticRegControlCreate();

// Set regularization parameters
lrc.l1 = 0.3;
lrc.l2 = 0.9;

// Train the logistic regression classifier
lr_mdl = logisticRegFit(y_train, X_train, lrc);

Continuing with our example, we can make test predictions like this:

/*
** Test model
*/
// Make predictions on the test set, from our trained model
y_hat = lmPredict(lr_mdl, X_test);

call classificationMetrics(y_test, y_hat);

This prints the following evaluation metrics:

===================================================
                             Classification metrics
===================================================
       Class   Precision  Recall  F1-score  Support

    Negative        0.98    0.97      0.97      131
    Positive        0.95    0.96      0.95       74

   Macro avg        0.96    0.96      0.96      205
Weighted avg        0.97    0.97      0.97      205

    Accuracy                          0.97      205