getduplicates#
Purpose#
Identifies duplicate observations and prints report.
Format#
- dup_report = getduplicates(x[, varlist])#
- Parameters:
x (matrix or dataframe) – data
varlist (string array) – Optional, list of variables to include in the check. Default is across all variables.
- Returns:
dup_report (dataframe) – Returns a dataframe containing duplicate observations from
x
with the row of the observed duplicates in the first column.
Examples#
new;
// Create file name with full path
fname = getGAUSSHome("examples/tips2.dta");
// Load the dataframe
tips2 = loadd(fname, "id + total_bill + tip + cat(day) + cat(time)");
// Locate and print duplicate observations
print getduplicates(tips2);
After the above code the printed output is
Row Num id total_bill tip day time
20.000 20.00 20.65 3.35 Sat Dinner
21.000 20.00 20.65 3.35 Sat Dinner
246.00 245.0 18.78 3.00 Thur Dinner
247.00 245.0 18.78 3.00 Thur Dinner
See also
Functions dropduplicates()
, isunique()