pdSummary#

Purpose#

Generates summary statistics for panel data, including overall, between-group, and within-group statistics.

Format#

pdOut = pdSummary(df[, varlist, missings, groupvar, datevar])#
Parameters:
  • df (Dataframe) – Contains long-form panel data with \(N_i x T_i\) rows and K columns.

  • varlist (1xP string array) – Optional, A list of variables to include in the summary. Default is all variables.

  • missings (Scalar) – Optional, scalar, indicator that missings are present in data. Missing values must be removed for procedure. Setting to 0 will speed up procedure but should be used only if certain that no missings are present. Default = 1.

  • groupvar (String) – Optional, specifies the name of the variable used to identify group membership for panel observations. Defaults to the first categorical or string variable in the dataframe.

  • datevar (String) – Optional, specifies the name of the variable used to identify dates for panel observations. Defaults to the first date variable in the dataframe.

Returns:

pdOut (Dataframe) –

A dataframe containing summary statistics:

  • Overall statistics: mean, standard deviation, minimum, and maximum for each variable.

  • Between-group statistics: mean, standard deviation, minimum, and maximum.

  • Within-group statistics: mean, standard deviation, minimum, and maximum.

Examples#

// Import data
fname = getGAUSSHome("examples/pd_ab.gdat");
pd_ab = loadd(fname);

// Get summary statistics
pd_summary = pdSummary(pd_ab);
==========================================================================================
Group ID:                             id          Balanced:                             No
Valid cases:                        1031          Missings:                              0
N. Groups:                           140          T. Average:                        7.364
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
emp                    Overall          7.892         15.935          0.104        108.562
                       Between              .         16.169          0.130        102.190
                        Within              .          2.210        -14.812         34.763
wage                   Overall         23.919          5.648          8.017         45.232
                       Between              .          5.184          8.713         36.060
                        Within              .          2.068         11.722         40.935
==========================================================================================

Remarks#

This function takes long-form panel data. To transform wide data to long-form data see dfLonger().

This function assumes panel is sorted by group and date. Note that panel data can be sorted using pdSort().

A strongly balanced panel dataset contains the same time points for each group. pdAllBalanced() examines the provided dataset to determine if it meets this condition.

  • If groupvar is not provided, the function defaults to the first categorical or string variable in the dataframe.

  • If datevar is not provided, the function defaults to the first date variable in the dataframe.