pdSummary#

Purpose#

Generates summary statistics for panel data, including overall, between-group, and within-group statistics.

Format#

pdOut = pdSummary(df[, varlist, missings, groupvar, datevar])#
Parameters:
  • df (Dataframe) – Contains long-form panel data with \(N_i \times T_i\) rows and K columns.

  • varlist (1xP string array) – Optional, A list of variables to include in the summary. Default is all variables.

  • missings (Scalar) – Optional, scalar, indicator that missings are present in data. Missing values must be removed for procedure. Setting to 0 will speed up procedure but should be used only if certain that no missings are present. Default = 1.

  • groupvar (String) – Optional, specifies the name of the variable used to identify group membership for panel observations. Defaults to the first categorical or string variable in the dataframe.

  • datevar (String) – Optional, specifies the name of the variable used to identify dates for panel observations. Defaults to the first date variable in the dataframe.

Returns:

pdOut (Dataframe) –

A dataframe containing summary statistics:

  • Overall statistics: mean, standard deviation, minimum, and maximum for each variable.

  • Between-group statistics: mean, standard deviation, minimum, and maximum.

  • Within-group statistics: mean, standard deviation, minimum, and maximum.

Examples#

// Import data
fname = getGAUSSHome("examples/pd_ab.gdat");
pd_ab = loadd(fname);

// Get summary statistics
pd_summary = pdSummary(pd_ab);
==========================================================================================
Group ID:                             id          Balanced:                             No
Valid cases:                        1031          Missings:                              0
N. Groups:                           140          T. Average:                        7.364
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
emp                    Overall          7.892         15.935          0.104        108.562
                       Between              .         16.169          0.130        102.190
                        Within              .          2.210        -14.812         34.763
wage                   Overall         23.919          5.648          8.017         45.232
                       Between              .          5.184          8.713         36.060
                        Within              .          2.068         11.722         40.935
==========================================================================================

Remarks#

This function takes long-form panel data. To transform wide data to long-form data see dfLonger().

This function assumes panel is sorted by group and date. Note that panel data can be sorted using pdSort().

This function determines summary statistics for panel data using the specified groupvar and datevar:

  • If groupvar is not provided, the function defaults to the first categorical or string variable in the dataframe.

  • If datevar is not provided, the function defaults to the first date variable in the dataframe.