getduplicates

Purpose

Identifies duplicate observations and prints report.

Format

dup_report = getduplicates(x[, varlist])
Parameters:
  • x (matrix or dataframe) – data

  • varlist (string array) – Optional, list of variables to include in the check. Default is across all variables.

Returns:

dup_report (dataframe) – Returns a dataframe containing duplicate observations from x with the row of the observed duplicates in the first column.

Examples

new;

// Create file name with full path
fname = getGAUSSHome("examples/tips2.dta");

// Load the dataframe
tips2 = loadd(fname, "id + total_bill + tip + cat(day) + cat(time)");

// Locate and print duplicate observations
print getduplicates(tips2);

After the above code the printed output is

Row Num         id   total_bill       tip       day        time
 20.000      20.00        20.65      3.35       Sat      Dinner
 21.000      20.00        20.65      3.35       Sat      Dinner
 246.00      245.0        18.78      3.00      Thur      Dinner
 247.00      245.0        18.78      3.00      Thur      Dinner

See also

Functions dropduplicates(), isunique()