pdIsBalanced#

Purpose#

Checks if each group in a panel dataset covers the maximum time span.

Format#

groupIsBalanced = pdIsBalanced(df[, groupvar, datevar])#

Parameters:

df (Dataframe) – Contains long-form panel data with \(N_i \times T_i\) rows and K columns.
groupvar (String) – Optional, specifies the name of the variable used to identify group membership for panel observations. Defaults to the first categorical or string variable in the dataframe.
datevar (String) – Optional, specifies the name of the variable used to identify dates for panel observations. Defaults to the first date variable in the dataframe.

Returns:

groupIsBalanced (Dataframe) – Indicates whether each group in the panel dataset spans the full time range of the dataset. Each group is assigned a value of 1 if it covers the full time span, 0 otherwise.

Examples#

// Load panel data and take the first 10 rows
pd = loadd(getGAUSSHome("examples/pd_ab.gdat"));
pd = pd[1:10,.];

print pd;

id             year              emp             wage
     1977-01-01        5.0409999        13.151600
     1978-01-01        5.5999999        12.301800
     1979-01-01        5.0149999        12.839500
     1980-01-01        4.7150002        13.803900
     1981-01-01        4.0929999        14.289700
     1982-01-01        3.1659999        14.868100
     1983-01-01        2.9360001        13.778400
     1977-01-01        71.319000        14.790900
     1978-01-01        70.642998        14.103600
     1979-01-01        70.917999        14.953400

// Check to see if each group is balanced
is_balanced = pdIsBalanced(pd);
print is_balanced;

The code above will return:

id         balanced
 1        1.0000000
 2        0.0000000

Remarks#

This function takes long-form panel data. To transform wide data to long-form data see dfLonger().

This function assumes panel is sorted by group and date. Note that panel data can be sorted using pdSort().

This function evaluates whether each group in a panel dataset spans the maximum time range observed across all groups.

If groupvar is not provided, the function defaults to the first categorical or string variable in the dataframe.
If datevar is not provided, the function defaults to the first date variable in the dataframe.

The resulting dataframe contains each group and a corresponding indicator (1 or 0) to represent whether the group covers the full time span.