dftype

Purpose

Set the types of columns in a matrix or dataframe.

Format

x_meta = dfType(X, types[, columns])
Parameters
  • X (NxK matrix) – data.

  • types (Mx1 vector) – Specifies types to be assigned to columns specified in columns. Valid options include: "string", "date", "number", and "category".

  • columns (Mx1 vector) – Optional argument, indices of columns in X to assign types to. Default = all columns.

Returns

x_meta (NxK dataframe) – Data with the types specified in types assigned to the columns specified in columns.

Remarks

Date Variables

When a numeric column is set to type date with dftype():

  • The data from the column is interpreted as POSIX time (seconds since Jan 1, 1970).

  • The default date format will be used. This can be changed with asdate().

Categorical and String Variables

When a numeric column is set to type category, or string with dftype():

  • The labels will be the string version of the number. The keys will be integers from 0 to n-1, where n is the number of unique values in the original data. The keys will be assigned to the labels such that the original order is maintained.

When a categorical or string variable is converted to a numeric column;

  • The updated column will contain the numeric keys associated with the string or category labels.

Examples

Example 1: POSIX time numeric column to date column

secs_per_day = 24 * 60 * 60;

// Create a 2x1 vector
x = 0 | secs_per_day;

// Set the numeric vector to be a date
x = dftype(x, "date", 1);
print x;

Since the date vector is interpreted as seconds since Jan 1, 1970, the code above will print:

        X1
1970-01-01
1970-01-02

Example 2: Category to number

// Load 'cycles' and load 'amplitude' as a categorical variable
fname = getGAUSSHome() $+ "examples/yarn.xlsx";
yarn = loadd(fname, "cat(amplitude) + cycles");

// Set the first column to be a numeric column
yarn_n = dftype(yarn, "number", 1);

After the above code, the first few rows look like this:

yarn = amplitude   cycles   yarn_n = amplitude   cycles
             low      674                    1      674
             low      370                    1      370
             low      292                    1      292
             med      338                    2      338

Example 3: Integer column to category

x = { 4,
      1,
      4,
      6 };

// Make 'x' a dataframe and set its
// only column to be a category
x = dftype(x, "category");

After the above code, x will be a datframe as shown below:

X1
 4
 1
 4
 6

We can get the categorical labels and key values like this:

{ labels, keys } = getcollabels(x, 1);

They will equal:

labels = "1"   keys = 0
         "4"          1
         "6"          2

We can set new labels with recodecatlabels() like this:

// Set the labels for 0, 1, and 2 to be
// alpha, beta and gamma
x = recodecatlabels(x, labels, "alpha"$|"beta"$|"gamma", 1);

Now x will be the following dataframe:

   X1
 beta
alpha
 beta
gamma

See also

Functions dfName(), setColLabels(), asdf(), asDate()