dropCategories#
Purpose#
Removes categories from a dataframe variable. Resets the keyvalues and labels for the variable.
Format#
- df = dropCategories(X, categories[, column])#
- Parameters:
X (NxK dataframe) – Data with metadata.
categories (String or string array) – The categories to be removed.
column (Scalar or string) – Optional argument, name or index of the categorical variable in X which contains categories to be removed. Must be specified if X contains more than one column. Default = 1.
- Returns:
df (NxK dataframe) – Data with specified categories removed.
Examples#
// Load data
fname = getGAUSSHome("examples/yarn.xlsx");
yarn = loadd(fname, "cat(yarn_length) + cat(amplitude) + cat(load) + cycles");
// Get column labels for yarn_length
labels = getCategories(yarn, "yarn_length");
print labels;
The code above prints the following table of original labels:
categories
high
low
med
Using the frequency()
function, we can see that the yarn_length column contains 9 observations of each category:
print frequency(yarn, "yarn_length");
Label Count Total % Cum. %
high 9 33.33 33.33
low 9 33.33 66.67
med 9 33.33 100
Total 27 100
Now, use dropCategories()
to drop the "high"
category and reprint the labels.
// Drop the "high" category
yarn = dropCategories(yarn, "high", "yarn_length");
// Get updated column labels
labels = getCategories(yarn, "yarn_length");
print labels;
The code above prints the following table of updated labels:
categories
low
med
Additionally, this time when we print the frequency report, we can see that the observations where yarn_length was equal to "high"
have been removed.
print frequency(yarn, "yarn_length");
Label Count Total % Cum. %
low 9 50 50
med 9 50 100
Total 18 100
See also
Functions dropunusedcategories()
, getColLabels()
, getCategories()
, reordercatlabels()