dfappend#

Purpose#

Vertically concatenates two dataframes.

Format#

df_stacked = dfappend(df1, df2)#
Parameters:
  • df1 (Dataframe) – The dataframe to stack on top.

  • df2 (Dataframe) – The dataframe to add to the bottom of df1.

Returns:

df_stacked (Dataframe) – The result of df1 stacked on top of df2.

Examples#

Example#

In the example below, we will load data from a Stata dataset and a CSV file and combine selected rows of the dataframes.

// Create file name with full path and load data
fname = getGAUSSHome("examples/tips2.dta");
tips_dta = loadd(fname, "tip + day");

// Create file name with full path and load data
fname = getGAUSSHome("examples/tips2.csv");
tips_csv = loadd(fname, "tip + cat(day)");

// Take a small sample of rows
tips_dta = tips_dta[1:3,.];
tips_csv = tips_csv[220:223,.];
tips_dta =        tip     day
            1.0100000     Sun
            1.6600000     Sun
            3.5000000     Sun

tips_csv =       tip      day
           1.4400000      Sat
           3.0900000      Sat
           2.2000000      Fri
           3.4800000      Fri

Next we will vertically concatenate the dataframes:

// Create a new dataframe with both
tips_stacked = dfappend(tips_dta, tips_csv);
tips_stacked =        tip     day
                1.0100000     Sun
                1.6600000     Sun
                3.5000000     Sun
                1.4400000     Sat
                3.0900000     Sat
                2.2000000     Fri
                3.4800000     Fri

The reason to use dfappend() instead of the vertical concatenation operator for dataframes with strings and categorical variables is that dfappend() will make sure that the category labels and keys are matched in the resulting dataframe.

// Get the category labels and keys for the day variable
// for the data from the csv and dta files.
{ lab_csv, key_csv } = getcollabels(tips_csv, "day");
{ lab_dta, key_dta } = getcollabels(tips_dta, "day");

After running the above code, we see that the keys and their order are different in these dataframes.

lab_csv =  Fri    key_csv =  0
           Sat               1
           Sun               2
          Thur               3

lab_dta = Thur    key_dta =  0
           Fri               1
           Sat               2
           Sun               3

The reason that the category keys and labels are not the same is that the keys for each label are specified in the Stata dataset, however the CSV file just has the text labels. In the case where the keys are not specified, GAUSS will assign the keys based on the alphabetical order of the category labels.

The vertical concatenation operator is optimized for speed and will not check to see if the keys match. It is designed for numeric variables and dates. It should not be used for dataframes with categorical variables.

Remarks#

  • dfappend() should be used instead of the vertical concatenation operator for dataframes with categorical or string columns, because dfappend() will merge the metadata in cases where the keys and labels are not identical.

  • Both inputs must be dataframes.