pdSummary#

Purpose#

Generates summary statistics for panel data, including overall, between-group, and within-group statistics.

Format#

pdOut = pdSummary(x, groupvar[, varlist, missings])#
Parameters:
  • x (NxK matrix or dataframe) – A matrix of panel data with N rows (observations) and K columns (variables).

  • groupvar (String) – A column vector indicating group membership for panel observations.

  • varlist (1xP string array) – Optional, A list of variables to include in the summary. Default is all variables.

  • missings – Optional, scalar, indicator that missings are present in data. Missing values must be

removed for procedure. Setting to 0 will speed up procedure but should be used only if certain that no missings are present. Default = 1.

Returns:

pdOut (Dataframe) –

A dataframe containing summary statistics:

  • Overall statistics: mean, standard deviation, minimum, and maximum for each variable.

  • Between-group statistics: mean, standard deviation, minimum, and maximum.

  • Within-group statistics: mean, standard deviation, minimum, and maximum.

  • Additional information: number of groups, average number of observations per group (T_ave), balance indicator (_isbalanced), valid and missing observation counts.

Examples#

// Panel data matrix (4 observations, 3 variables)
x = { 1 10 100,
      2 20 200,
      3 30 300,
      4 40 400 };

// Group variable (indicating group membership for each observation)
groupvar = {1, 1, 2, 2};

// Summarize all variables, dropping missing values
pdOut = pdSummary(x, groupvar);

The code above will return a data frame with overall, between-group, and within-group summary statistics.

Remarks#

The summary statistics generated by pdSummary() include between and within-group variations that are useful for panel data analysis. If the varlist argument is provided, the summary is restricted to those variables. Missing data can be handled by setting the drop_missings argument to 1.

The returned data frame contains:

  • “Variable” and “Measure” columns for the name of the variable and the type of statistic (Overall, Between, Within).

  • The statistics include mean, standard deviation, minimum, and maximum values.

See also:

See also

pdsize()