tabulate

Purpose

Generates and returns two-way tables of frequencies.

Format

table_df = tabulate(data, formula[, tbCtl])
table_df = tabulate(filename, formula[, tbCtl])
table_df = tabulate(df1, df2[, tbCtl])
Parameters:
  • data (NxK dataframe) – Contains variables specified in formula.

  • formula (string) – formula string. E.g "df ~ df2 + df3", "df1" categories will be reported in rows, separate columns will be returned for each category in "df1" and "df2".

  • filename (string) – Name of file storing variables specifiec in formula.

  • df1 (Nx1 dataframe) – Contains variable whose categories will be reported in the rows of the frequency table.

  • df2 (NxK dataframe) – Contains variables whose categories will be reported in the cols of the frequency table.

  • tbctl (Struct) –

    An optional tabControl structure with the following members:

    tbctl.exclude

    String, the categories to be excluded from table counts. Totals will not include observations in excluded categories.

    tbctl.unusedLevels

    Scalar, indicates whether to include unused levels in table. Set to 0 to remove unused levels from the table. Default = 1.

Returns:

df_long (Dataframe) – The input data converted to long form.

Examples

Basic usage with a dataframe and a formula string

// Load data
fname = getGAUSSHome("examples/tips2.dta");
tips2 = loadd(fname);

// Two-way table
call tabulate(tips2, "sex ~ smoker");

This reports the two-way frequency table:

============================================================
          sex                   smoker                 Total
============================================================
                          No            Yes


       Female             55             33               88
         Male             99             60              159

        Total            154             93              247
============================================================

Tabulate can also generate multiple two-way frequency tables using the same data:

// Generate separate tables for sex vs smoker
// and sex vs time
call tabulate(tips2, "sex ~ smoker + time");
============================================================
          sex                   smoker                 Total
============================================================
                          No            Yes


       Female             55             33               88
         Male             99             60              159

        Total            154             93              247
============================================================
          sex                    time                  Total
============================================================
                       Lunch         Dinner


       Female             35             53               88
         Male             33            126              159

        Total             68            179              247
============================================================

Basic usage with a filename and a formula string

The same tables can be directly generate from the filename

// Load data
fname = getGAUSSHome("examples/tips2.dta");

// Two-way table
call tabulate(fname, "sex ~ smoker");
============================================================
          sex                   smoker                 Total
============================================================
                          No            Yes


       Female             55             33               88
         Male             99             60              159

        Total            154             93              247
============================================================

Tabulate separate dataframe vectors and assign the return value

// Load all variables from the dataset
tips = loadd(getGAUSShome("examples/tips2.dta"));

// Create separate vectors for each variable
day = tips[.,"day"];
time_ = tips[.,"time"];

// Compute the frequency table and assign the result to 't'
t = tabulate(day, time_);

After running the above code, t will contain a dataframe with the frequencies. The totals will not be included:

print t;
 day       time_Lunch      time_Dinner
Thur        61.000000        2.0000000
 Fri        7.0000000        12.000000
 Sat        0.0000000        89.000000
 Sun        0.0000000        76.000000

Handling unrepresented categories

In this example, we will load some data and then take a sample that does not contain any observations of a particular category level.

// Load two variables from the dataset
tips = loadd(getGAUSShome("examples/tips2.dta"), "smoker + day");

// Take the first 50 observations as a sample
tips = tips[1:50,.];

// Compute and print the frequency table
call tabulate(tips, "day ~ smoker");

In this case, the following will be printed:

============================================================
            day                   smoker               Total
============================================================
                            No            Yes


           Thur              0              0              0
            Fri              0              0              0
            Sat             23              0             23
            Sun             27              0             27

          Total             50              0             50
============================================================

In some situations, you may not want to report these unrepresented categories. In that case, you can use the unusedLevels member of the tabControl structure to supress those levels.

struct tabControl tbctl;
tbctl = tabControlCreate();

// Supress unrepresented categories
tbctl.unusedLevels = 0;

// Compute and print the frequency table
call tabulate(tips, "day ~ smoker", tbctl);

This time the report will omit the unrepresented levels.

=============================================
            day         smoker          Total
=============================================
                            No


            Sat             23             23
            Sun             27             27

          Total             50             50
=============================================

See also

Functions frequency(), plotFreq()