normalizecollabels#

Purpose#

Finds instances of identical labels that have differing keys in a dataframe and merges them so all labels are unique.

If identical labels are merged, all references to the key of the duplicate label will be updated.

Format#

x_norm = normalizecollabels(x[, columns])#
Parameters:
  • x – data.

  • columns (Mx1 scalar or string/string array.) – Optional. The names or indices of the string/category columns in x to normalize. All string/category columns will be processed if omitted.

Returns:

x_norm (NxK dataframe) – Data with normalized string/categorical variables

Remarks#

The normalizecollabels() procedure is useful when cleaning and merging categorical variables that may come from different sources. This is primarily a convenience function utilized by multiple string-related functions and in general should not need to be called explicitly by an end-user.

Examples#

// Create a dataframe with categorical data
x = asdf(1|2|3|1|2, "group");

// Normalize category labels so all are unique
x_norm = normalizecollabels(x);
print x_norm;

See also

dfappend()