pdSize#
Purpose#
Provides size description of a panel dataset including the number of groups, number of time observations for each group.
Format#
- { num_grps, T, balanced } = pdSize(df, groupvar)#
- Parameters:
df (Dataframe) – Contains long-form panel data with \(N_i x T_i\) rows and K columns.
groupvar (String) – A column vector indicating group membership for panel observations.
- Returns:
num_grps (Scalar) – Number of groups in the panel.
T (Vector) – Containing number of time observations for each group.
balanced – Indicator if panel is balanced, report 1 for balanced data, 0 othewise.
- Rtype:
Scalar
Examples#
If your group variable is the first categorical variable in your dataframe and the date variable is a GAUSS date variable and not just a numeric column, you can just pass in the panel dataframe and GAUSS will locate the group and date variables for you.
// Import data
fname = getGAUSSHome("examples/pd_ab.gdat");
pd_ab = loadd(fname);
// Take a small sample for the example
pd_smpl = pd_ab[1:4 8:11,.];
// Print our sample
print pd_smpl;
id year emp wage
1 1977-01-01 5.0410 13.1516
1 1978-01-01 5.6000 12.3018
1 1979-01-01 5.0150 12.8395
1 1980-01-01 4.7150 13.8039
2 1977-01-01 71.3190 14.7909
2 1978-01-01 70.6430 14.1036
2 1979-01-01 70.9180 14.9534
2 1980-01-01 72.0310 15.4910
// Check size of panel
{ num_grps, T_2, _isbalanced } = pdSize(pd_smpl);
The above code will return:
============================================================
Group ID: id Balanced: Yes
Valid cases: 8 Missings: 0
N. Groups: 2 T. Average: 4.000
============================================================
id T[i] Start Date End Date
------------------------------------------------------------
1 4 1977-01-01 1980-01-01
2 4 1977-01-01 1980-01-01
============================================================
Remarks#
This function takes long-form panel data. To transform wide data to long-form data see dfLonger()
.
This function assumes panel is sorted by group and date. Note that panel data can be sorted using pdSort()
.
If groupvar is not provided, the function defaults to the first categorical or string variable in the dataframe.
If datevar is not provided, the function defaults to the first date variable in the dataframe.
See also