aggregate#

Purpose#

Aggregates the data in the columns of a matrix based upon a column containing group ids with a choice of method.

Format#

x_agg = aggregate(x, method[, column, fast])#
Parameters:
  • x (NxK matrix or dataframe) – Data, if column is not specified, the first column must contain the ids for the groups on which to aggregate.

  • method (String) –

    Specifies which aggregation method to use.

    Valid options:

    ”mean”

    ”median”

    ”mode”

    ”min”

    ”max”

    ”sd” (sample standard deviation)

    ”sum”

    ”variance” (sample variance)

  • column (string) – Optional, specifies which variable contains the groups on which to aggregate.

  • fast (scalar) – Optional, specifies fast computation that does not check for missing values. Set to 1 to use fast method.

Returns:

x_agg (NGROUPSxK matrix) – The input aggregated by the group id, using the specified method.

Examples#

Example 1#

This example aggregates a matrix with one group id column and one column of data by mean and then by minimum.

// Create a matrix where the first
// column is the group id
X = { 1002  7,
      1001  2,
      1004  9,
      1001  8,
      1004  6,
      1003  3,
      1002  5,
      1001  4 };

agg_mean = aggregate(X, "mean");

agg_min = aggregate(X, "min");

The above code will make the following assignments:

           1001   4.66667
agg_mean = 1002         6
           1003         3
           1004       7.5

           1001         2
agg_min  = 1002         5
           1003         3
           1004         6

Example 2#

This example aggregates the data from a matrix with one group id column and two data columns first by sample standard deviation and then by variance.

// Create a matrix where the first
// column is the group id
X = { 1002   18  -5.1,
      1001   22   0.0,
      1001   47   3.3,
      1001   94   5.6,
      1001   17  -0.5,
      1001   72   7.5,
      1002   89   4.8,
      1001   67   2.3,
      1002   54   6.6,
      1002   61  -6.8,
      1002    7   1.3,
      1002   40   -2.1 };

// aggregate by standard deviation
agg_sd = aggregate(X, "sd");

agg_var = aggregate(X, "variance");

The above code will make the following assignments:

agg_sd  = 1001    30.10     3.13
          1002    29.90     5.38

agg_var = 1001   906.17     9.77
          1002   894.17    28.93

Example 3#

This example specifies the column name to be used for aggregation.

// Load data
auto2 = loadd(getGAUSSHome("examples/auto2.dta"));

// Aggregate data using
// foreign column as group
aggregate(auto2[., "price" "mpg" "foreign"], "mean", "foreign");

The above code will make the following table

foreign     price      mpg
Domestic  6072.423   19.827
Foreign   6384.682   24.773

See also

Functions meanc(), modec(), selif()