impute

Purpose

Replaces missing values in the columns of a matrix by a specified imputation method.

Format

x_full = impute(x[, method[, indvars[, iCtl]]])
Parameters:
  • x (NxK matrix) – Data matrix which has missing values to be imputed. If no missing values, original matrix will be returned.
  • method (string) –

    Optional input. Specifies which imputation method to use.

    Valid options:

    ”mean” Replace missing values with the mean of the column (default).
    ”median” Replace missing values with the median of the column.
    ”mode” Replace missing values with the mode of the column.
    ”pmm” Replace missing values using predictive mean matching.
    ”lrd” Replace missing values using local residual draws.
    ”predict” Replace missing values using linear regression prediction.
  • indvars (NxK matrix) – Optional input, matrix of variables to be used to impute the missing values. Should not contain any missing values. Must be specified if using the “pmm”, “lrd”, or “predict” methods.
  • iCtl (struct) –

    Optional input, an instance of an imputeControl structure. The following members of iCtl are referenced within the impute() “pmm”, “lrd”, and “predict” routines:

    iCtl.numberSeries Scalar, number of series to be imputed. Multiple series only valid for Nx1 x vector. Default = 1.
    iCtl.numberDonors Scalar, number of donors to be considered for PMM and LRD methods if dMax member is set to zero. If the dMax member is nonzero the numberDonors member will be used to determine candidate donors only if no potential donors meet the maximum distance criteria. Default = 5.
    iCtl.dMax Scalar, maximum distance cutoff to be used to determine candidate donors. If set to zero, the numberDonors member will be used to determine candidate donors. If non-zero and adaptiveDmax is set to one, the numberDonors member will be used to determine candidate donors only if no donor meet the maximum distance criteria. Default = 0.
    iCtl.matchingType Integer, the type of matching to be used in the predictive mean matching. Default = 1. Acceptable values:
    0:Type 0 matching. Ignores variability in estimated betas and OLS beta is used for predicting in both the missing and observed cases.
    1:Type 1 matching. Uses OLS \(\beta\) for predicting for observed cases and a beta drawn from the posterior distribution for prediction in the missing cases.
    2:Type 2 matching. Uses same \(\beta\) drawn from the posterior distribution for predicting in both the missing and observed cases.
    3:Type 3 matching. Uses same different \(\beta\) drawn from the same posterior distribution for predicting in the missing and observed cases.
    iCtl.linearMethod String, the prediction method used for LRD or linear prediction. Default = "bayes" Acceptable values:
    ”predict”:OLS \(\beta\) is used for predicting in missing cases.
    ”noise”:OLS \(\beta\) is used for predicting in missing cases and a random disturbance drawn from \(N(0, \hat{\sigma})\) is added to the prediction.
    ”bayes”:Uses \(\dot{\beta}\) drawn from the posterior distribution for predicting missing cases and a random disturbance drawn from \(N(0, \dot{\sigma})\) is added to the prediction. \(\dot{\sigma}\) is drawn from the posterior distribution.
    ”bootstrap”:Coefficient and sigma are the least squares estimates calculated from a bootstrap sample taken from the observed data. A random disturbance is drawn from \(N(0, \dot{sigma})\) is added to the prediction.
    iCtl.adaptiveDmax Scalar, indicator variable, either one or zero. When set to one uses an adaptive method that uses the numberDonors member to determine the number of potential candidates when no potential donors meet the max distance criteria. When set to zero missing values will be kept in dataset if no potential candidates meet the max distance criteria. Default = 0.
    iCtl.k Scalar, ridge parameter used evade singular matrices when computing Bayesian and Bootstrap posterior distributions. Default = 0.00001.
Returns:

x_full (matrix) – the input matrix with the missing values from each column filled in by the specified imputation method.

Examples

// Create 3x3 matrix with a missing value
x = { 1    2    3,
      4    .    5,
      7    8    9,
     10   11    . };

// Replace missing values with column mean
x_default = impute(x);

// Replace missing values with column median
x_median = impute(x, "median");

// Replace missing values with column mean
x_mean = impute(x, "mean");

The above code will make the following assignments:

               1    2    3
x_default =    4    7    5
               7    8    9
              10   11    5.67

               1    2    3
x_median  =    4    8    5
               7    8    9
              10   11    5

               1    2    3
x_mean    =    4    7    5
               7    8    9
              10   11    5.67

Remarks

  • If all elements of a column passed to impute() are missing values, every element of the corresponding column returned will contain missing values.
  • To replace the missing values in each column with a constant value, use missrv(). It will allow you to specify one constant for the entire matrix, or a separate constant for each column.
  • Use the miss() function to replace specific values (for example 999) with GAUSS missing values.
  • The packr() function will remove all rows which contain one or more missing values (listwise deletion).

See also

Functions missrv(), miss(), reclassify(), packr()