logisticRegFit =================== Purpose ---------------- Fit a logistic regression model with an optional L1 and/or L2 penalty. Format ------------ .. function:: mdl = logisticRegFit(y, X [, ctl]) :param y: The target, or dependent variable. :type y: Nx1 vector :param X: The model features, or independent variables. :type X: NxP matrix :param lambda: Optional L1 penalties. The model will be estimated for each lambda value. If not provided and *ctl.lambdas* is an empty matrix, {}, :func:`lassoFit` will create a vector of decreasing values. Default = {}. :type lambda: Scalar, or Kx1 vector :param ctl: Optional input, an instance of a :class:`logisticRegControl` structure. An instance named *ctl* will have the following members: .. csv-table:: :widths: auto "ctl.l1","Scalar, the L1 regulariazation penalty. Default = 0." "ctl.l2","Scalar, the L2 regulariazation penalty. Default = 0." "ctl.intercept","Scalar, indicator to include constant. Set to 1 to include constant, 0 otherwise. Default = 1." "ctl.tolerance","Scalar, the tolerance for convergence of the coordinant descent optimization for each lambda value. Default = 1e-4." "ctl.batch_size","Scalar, size of batch to include in model. Default = 0 (all observations)." "ctl.max_iters","The maximum number of iterations for the coordinate descent optimization for each provided *lambda*. Default = 1000." "ctl.solver_type","Solver to use for otpimization. Default = lbfgs." :type ctl: struct :return mdl: An instance of a :class:`logisticRegModel` structure. An instance named *mdl* will have the following members: .. csv-table:: :widths: auto "mdl.alpha_hat","(*1 x nlambdas vector*) The estimated value for the intercept for each provided *lambda*." "mdl.beta_hat","(*P x nlambdas matrix*) The estimated parameter values for each provided *lambda*." "mdl.l1","Scalar, the L1 regulariazation penalty." "mdl.l2","Scalar, the L2 regulariazation penalty." "mdl.nobs","Scalar, the number of observations." "mdl.nvars","Scalar, the number of variables." :rtype mdl: struct Examples ----------- Example 1: Basic Logistic Regression +++++++++++++++++++++++++++++++++++++++++++++ :: new; library gml; rndseed 23423; /* ** Load data and prepare data */ // Load wine quality dataset dataset = loadd(getGAUSSHome("pkgs/gml/examples/winequality.csv")); // Separate target variable from predictive features X = delcols(dataset, "quality"); y = dataset[.,"quality"]; // Split data into (80%) training and (20%) test sets { y_train, y_test, X_train, X_test } = trainTestSplit(y, X, 0.8); // Scale training features { X_train, mu, sd } = rescale(X_train, "standardize"); /* ** Train model */ // The logisticRegModel structure holds the trained model struct logisticRegModel lrm; // Fit training data, using default options lrm = logisticRegFit(y_train, X_train); Continuing with our example, we can make test predictions like this: :: /* ** Test model */ // Apply training scale parameters to test data X_test = rescale(X_test, mu, sd); // Make predictions using test data predictions = lmPredict(lrm, X_test); call classificationMetrics(y_test, predictions); This prints the following evaluation metrics: :: =================================================== Classification metrics =================================================== Class Precision Recall F1-score Support 3 0.00 0.00 0.00 2 4 0.00 0.00 0.00 12 5 0.67 0.77 0.72 137 6 0.56 0.63 0.60 131 7 0.47 0.22 0.30 36 8 0.00 0.00 0.00 2 Macro avg 0.28 0.27 0.27 320 Weighted avg 0.57 0.61 0.59 320 Accuracy 0.61 320 Example 2: Basic Logistic Regression with Regulariazation ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ :: new; library gml; /* ** Load data and prepare data */ // Load all variables from dataset, except for 'ID' fname = getGAUSSHome("pkgs/gml/examples/breastcancer.csv"); data = loadd(fname, ". -ID"); // Remove any rows with missing values data = packr(data); // Extract target variable and set class names // for more informative reporting y = data[., "class"]; y = setcollabels(y, "Positive"$|"Negative", 1|0); // Remove target variable to create feature dataframe X = delcols(data, "class"); // Split data into 70% training and 30% test set { y_train, y_test, X_train, X_test } = trainTestSplit(y, X, 0.7); /* ** Train model */ // Declare 'lr_mdl' to be an 'logisticRegModel' structure // to hold the trained model struct logisticRegModel lr_mdl; // Declare 'lrc' to be a logisticRegControl // structure and fill with default settings struct logisticRegControl lrc; lrc = logisticRegControlCreate(); // Set regularization parameters lrc.l1 = 0.3; lrc.l2 = 0.9; // Train the logistic regression classifier lr_mdl = logisticRegFit(y_train, X_train, lrc); Continuing with our example, we can make test predictions like this: :: /* ** Test model */ // Make predictions on the test set, from our trained model y_hat = lmPredict(lr_mdl, X_test); call classificationMetrics(y_test, y_hat); This prints the following evaluation metrics: :: =================================================== Classification metrics =================================================== Class Precision Recall F1-score Support Negative 0.98 0.97 0.97 131 Positive 0.95 0.96 0.95 74 Macro avg 0.96 0.96 0.96 205 Weighted avg 0.97 0.97 0.97 205 Accuracy 0.97 205 .. seealso:: :func:`ridgeFit`, :func:`lassoFit`, :func:`lmPredict`