aggregate#

Purpose#

Aggregates the data in the columns of a matrix based upon a column containing group ids with a choice of method.

Format#

x_agg = aggregate(x, method[, id_cols, skip_miss_check])#

Parameters:

x (NxK matrix or dataframe) – Data, if id_cols is not specified, the first column must contain the ids for the groups on which to aggregate.
method (String) –
Specifies which aggregation method to use.

Valid options:

”mean”

”median”

”mode”

”min”

”max”

”sd” (sample standard deviation)

”sum”

”variance” (sample variance)
id_cols (string or string array) – Optional, specifies which variable(s) contain the groups on which to aggregate. To aggregate by multiple columns, use the $| operator to concatenate column names (e.g., "day" $| "time").
skip_miss_check (scalar) – Optional. Default: 0. Set to 1 to skip checking for missing values (faster but missings may affect results). When 0, missing values are handled per-column.

Returns:

x_agg (NGROUPSxK matrix) – The input aggregated by the group id, using the specified method.

Examples#

Example 1#

This example aggregates a matrix with one group id column and one column of data by mean and then by minimum.

// Create a matrix where the first
// column is the group id
X = { 1002  7,
      1001  2,
      1004  9,
      1001  8,
      1004  6,
      1003  3,
      1002  5,
      1001  4 };

agg_mean = aggregate(X, "mean");

agg_min = aggregate(X, "min");

The above code will make the following assignments:

           1001   4.66667
agg_mean = 1002         6
           1003         3
           1004       7.5

           1001         2
agg_min  = 1002         5
           1003         3
           1004         6

Example 2#

This example aggregates the data from a matrix with one group id column and two data columns first by sample standard deviation and then by variance.

// Create a matrix where the first
// column is the group id
X = { 1002   18  -5.1,
      1001   22   0.0,
      1001   47   3.3,
      1001   94   5.6,
      1001   17  -0.5,
      1001   72   7.5,
      1002   89   4.8,
      1001   67   2.3,
      1002   54   6.6,
      1002   61  -6.8,
      1002    7   1.3,
      1002   40   -2.1 };

// aggregate by standard deviation
agg_sd = aggregate(X, "sd");

agg_var = aggregate(X, "variance");

The above code will make the following assignments:

agg_sd  = 1001    30.10     3.13
          1002    29.90     5.38

agg_var = 1001   906.17     9.77
          1002   894.17    28.93

Example 3#

This example specifies the column name to be used for aggregation.

// Load data
auto2 = loadd(getGAUSSHome("examples/auto2.dta"));

// Aggregate data using
// foreign column as group
aggregate(auto2[., "price" "mpg" "foreign"], "mean", "foreign");

The above code will make the following table

foreign     price      mpg
Domestic  6072.423   19.827
Foreign   6384.682   24.773

Example 4#

This example aggregates by multiple group columns, finding the maximum value for each combination of day and time.

// Load data
tips = loadd(getGAUSSHome("examples/tips2.dta"), "day + time + total_bill + tip");

// View first few rows
head(tips);

The above code will print:

day             time       total_bill              tip
Sun           Dinner        16.990000        1.0100000
Sun           Dinner        10.340000        1.6600000
Sun           Dinner        21.010000        3.5000000
Sun           Dinner        23.680000        3.3100000
Sun           Dinner        24.590000        3.6100000

// Aggregate by day and time, finding max values
tips_a = aggregate(tips, "max", "day" $| "time");
print tips_a;

The above code will print:

 day             time       total_bill              tip
Thur            Lunch        43.110000        6.7000000
Thur           Dinner        18.780000        3.0000000
 Fri            Lunch        16.270000        3.4800000
 Fri           Dinner        40.170000        4.7300000
 Sat           Dinner        50.810000        10.000000
 Sun           Dinner        48.170000        6.5000000