dropUnusedCategories

Purpose

Removes categories and keys from the meta data of a dataframe variable.

Format

df = dropUnusedCategories(X[, column])
Parameters:
  • X (NxK dataframe) – Data with metadata.

  • column (Scalar or string) – Optional argument, name or index of the categorical variable in X which contains categories to be removed. Must be specified if X contains more than one column. Default = 1.

Returns:

df (NxK dataframe) – Data with specified categories removed.

Examples

// Load data
fname = getGAUSSHome("examples/yarn.xlsx");
yarn = loadd(fname, "amplitude + cycles");

// Select the first 5 rows only
yarn = yarn[1:5,.];

print yarn;

This sample from the first five rows only contains the categories, low and med.

amplitude           cycles
      low        674.00000
      low        370.00000
      low        292.00000
      med        338.00000
      med        266.00000

However, when the dataframe was loaded, it also contained one more level, high. Using the getcategories() function, we can see that this information is still stored in yarn.

print getCategories(yarn, "amplitude");
categories
      high
       low
       med

There are several reasons that this is, in most cases, convenient and dramatically improves performance. However, that discussion is beyond the scope of this page.

You can use dropunusedcategories() to remove these unused categories:

// Drop all categories from the meta data
// that are not represented in the column
yarn = dropUnusedCategories(yarn, "amplitude");

Now we see that the category high is no longer recorded in the meta data for the amplitude variable.

print getCategories(yarn, "amplitude");
categories
       low
       med