dfappend#
Purpose#
Vertically concatenates two dataframes.
Format#
- df_stacked = dfappend(df1, df2)#
- Parameters:
df1 (Dataframe) – The dataframe to stack on top.
df2 (Dataframe) – The dataframe to add to the bottom of
df1
.
- Returns:
df_stacked (Dataframe) – The result of
df1
stacked on top ofdf2
.
Examples#
Example#
In the example below, we will load data from a Stata dataset and a CSV file and combine selected rows of the dataframes.
// Create file name with full path and load data
fname = getGAUSSHome("examples/tips2.dta");
tips_dta = loadd(fname, "tip + day");
// Create file name with full path and load data
fname = getGAUSSHome("examples/tips2.csv");
tips_csv = loadd(fname, "tip + cat(day)");
// Take a small sample of rows
tips_dta = tips_dta[1:3,.];
tips_csv = tips_csv[220:223,.];
tips_dta = tip day
1.0100000 Sun
1.6600000 Sun
3.5000000 Sun
tips_csv = tip day
1.4400000 Sat
3.0900000 Sat
2.2000000 Fri
3.4800000 Fri
Next we will vertically concatenate the dataframes:
// Create a new dataframe with both
tips_stacked = dfappend(tips_dta, tips_csv);
tips_stacked = tip day
1.0100000 Sun
1.6600000 Sun
3.5000000 Sun
1.4400000 Sat
3.0900000 Sat
2.2000000 Fri
3.4800000 Fri
The reason to use dfappend()
instead of the vertical concatenation operator for dataframes with strings and categorical variables is that dfappend()
will make sure that the category labels and keys are matched in the resulting dataframe.
// Get the category labels and keys for the day variable
// for the data from the csv and dta files.
{ lab_csv, key_csv } = getcollabels(tips_csv, "day");
{ lab_dta, key_dta } = getcollabels(tips_dta, "day");
After running the above code, we see that the keys and their order are different in these dataframes.
lab_csv = Fri key_csv = 0
Sat 1
Sun 2
Thur 3
lab_dta = Thur key_dta = 0
Fri 1
Sat 2
Sun 3
The reason that the category keys and labels are not the same is that the keys for each label are specified in the Stata dataset, however the CSV file just has the text labels. In the case where the keys are not specified, GAUSS will assign the keys based on the alphabetical order of the category labels.
The vertical concatenation operator is optimized for speed and will not check to see if the keys match. It is designed for numeric variables and dates. It should not be used for dataframes with categorical variables.
Remarks#
dfappend()
should be used instead of the vertical concatenation operator for dataframes with categorical or string columns, becausedfappend()
will merge the metadata in cases where the keys and labels are not identical.Both inputs must be dataframes.