pdSort#

Purpose#

Sorts panel data by group and then by date variable.

Format#

pd_sorted = pdSort(df[, groupvar, datevar])#
Parameters:
  • df (Dataframe) – Contains long-form panel data with \(N_i \times T_i\) rows and K columns.

  • groupvar (String) – Optional, name of the variable used to identify group membership for panel observations. Defaults to the first categorical or string variable in the dataframe.

  • datevar (String) – Optional, name of the variable used to identify dates for panel observations. Defaults to the first date variable in the dataframe.

Returns:

pd_sorted (Dataframe) – A dataframe containing the sorted panel data.

Examples#

// Import data
fname = getGAUSSHome("examples/pd_ab.gdat");
pd_ab = loadd(fname);

// Take out of order sample
pd_smpl = pd_ab[3 10 8 4 2 9,.];
print pd_smpl;
id             year              emp             wage
 1       1979-01-01        5.0149999        12.839500
 2       1979-01-01        70.917999        14.953400
 2       1977-01-01        71.319000        14.790900
 1       1980-01-01        4.7150002        13.803900
 1       1978-01-01        5.5999999        12.301800
 2       1978-01-01        70.642998        14.103600
// Sort sample
pd_srted = pdSort(pd_smpl);

print pd_srted;
id             year              emp             wage
 1       1978-01-01        5.5999999        12.301800
 1       1979-01-01        5.0149999        12.839500
 1       1980-01-01        4.7150002        13.803900
 2       1977-01-01        71.319000        14.790900
 2       1978-01-01        70.642998        14.103600
 2       1979-01-01        70.917999        14.953400

Remarks#

This function takes long-form panel data. To transform wide data to long-form data see dfLonger().

This function sorts panel data by the specified groupvar and datevar, ensuring the data is arranged in the correct order for panel data analysis.

  • If groupvar is not provided, the function defaults to the first categorical or string variable in the dataframe.

  • If datevar is not provided, the function defaults to the first date variable in the dataframe.

Sorting panel data is essential for consistent results in other panel data functions, such as pdLag(), pdDiff(), and pdTimeSpans().

See also

sortc(), sortmc()