dstat

Purpose

Computes descriptive statistics.

Note

This function is deprecated, use dstatmt() instead.

Format

{ vnam, mean, var, std, min, max, valid, mis } = dstat(dataset, vars)
Parameters:
  • dataset (string) – name of dataset. If dataset is null or 0, vars will be assumed to be a matrix containing the data.
  • vars (string or string array) –

    the variables.

    If dataset contains the name of a dataset, vars will be interpreted as either:

    • A Kx1 character vector containing the names of variables.
    • A Kx1 numeric vector containing indices of variables.
    • A formula string. e.g. "PAY + WT" or ". - sex".

    These can be any size subset of the variables in the dataset and can be in any order. If a scalar 0 is passed, all columns of the dataset will be used.

    If dataset is null or 0, vars will be interpreted as a NxK matrix, the data on which to compute the descriptive statistics.

Returns:
  • vnam (Kx1 character vector) – the names of the variables used in the statistics.
  • mean (Kx1 vector) – means.
  • var (Kx1 vector) – variance.
  • std (Kx1 vector) – standard deviation.
  • min (Kx1 vector) – minima.
  • max (Kx1 vector) – maxima.
  • valid (Kx1 vector) – the number of valid cases.
  • mis (Kx1 vector) – the number of missing cases.

Global Input

__altnam

matrix, default 0.

This can be a Kx1 character vector of alternate variable names for the output.

__maxbytes

scalar, the maximum number of bytes to be read per iteration of the read loop. Default = 1e9.

__maxvec

scalar, the largest number of elements allowed in any one matrix. Default = 20000.

__miss

scalar, default 0.

0 there are no missing values (fastest).
1 listwise deletion, drop a row if any missings occur in it.
2 pairwise deletion.
__row

scalar, the number of rows to read per iteration of the read loop.

if 0, (default) the number of rows will be calculated using __maxbytes and __maxvec.

__output

scalar, controls output, default 1.

1 print output table.
0 do not print output.

Examples

Example 1

// Calculate statistics on all variables in dataset
file = getGAUSShome() $+ "examples/freqdata.dat";

// Calculate statistics on all variables in dataset: AGE, PAY, sex and WT
vars = 0;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);

After the above code,

-------------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum     Valid Missing
-------------------------------------------------------------------------------
AGE           -----     -----       -----    1.0000   10.0000       400    0
PAY          1.9675    0.8019      0.6431    1.0000    3.0000       400    0
sex           -----     -----       -----     -----     -----       400    0
WT           1.4699    0.3007      0.0904    1.0000    1.9900       400    0

Example 2

file = getGAUSShome() $+ "examples/freqdata.dat";

// Calculate statistics on just AGE and PAY
vars = "AGE" $| "PAY";
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);

After the above code,

-------------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum     Valid Missing
-------------------------------------------------------------------------------
AGE           -----     -----       -----    1.0000   10.0000       400    0
PAY          1.9675    0.8019      0.6431    1.0000    3.0000       400    0

Example 3

file = getGAUSShome() $+ "examples/freqdata.dat";

// Calculate statistics on just AGE and PAY using numerical indices
vars = { 1, 2 };
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);

After the above code,

------------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum   Valid  Missing
------------------------------------------------------------------------------
AGE           -----     -----       -----    1.0000   10.0000     400    0
PAY          1.9675    0.8019      0.6431    1.0000    3.0000     400    0

Example 4

file = getGAUSShome() $+ "examples/freqdata.dat";

// Calculate statistics on just AGE and PAY using __miss
vars = { 1, 2 };

// Drop rows with missing values
__miss = 1;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);

After the above code,

------------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum   Valid  Missing
------------------------------------------------------------------------------
AGE          5.6784    2.9932      8.9593    1.0000   10.0000     398    2
PAY          1.9623    0.8006      0.6409    1.0000    3.0000     398    2

Example 5

/*
** Calculate statistics using formula string and  __miss
** Set up a formula string with all variables exclude "sex"
*/
vars = ". - sex";

// Drop rows with missing values
__miss = 1;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);

After the above code,

-----------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum   Valid Missing
-----------------------------------------------------------------------------
AGE          5.6784    2.9932      8.9593    1.0000   10.0000     398    2
PAY          1.9623    0.8006      0.6409    1.0000    3.0000     398    2
WT           1.4713    0.3009      0.0906    1.0000    1.9900     398    2

Example 6

Descriptive statistics on a matrix.

data = { 1 2, 3 4, 5 6, 7 8 };
call dstat("", data);

After the above code,

-----------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum   Valid Missing
-----------------------------------------------------------------------------
X1                4     2.582       6.667         1         7       4    0
X2                5     2.582       6.667         2         8       4    0

Example 7

Specify variable names.

// Note the use of the matrix concatenation operator, '|'
// instead of the string concatenation operator, `$|`,
// makes this a 2x1 character vector
__altnam = "ALPHA" | "BETA";
data = { 1 2, 3 4, 5 6, 7 8 };
call dstat("", data);

After the above code,

-----------------------------------------------------------------------------
Variable       Mean   Std Dev    Variance   Minimum   Maximum   Valid Missing
-----------------------------------------------------------------------------
ALPHA             4     2.582       6.667         1         7       4    0
BETA              5     2.582       6.667         2         8       4    0

Remarks

1. If pairwise deletion is used, the minima and maxima will be the true values for the valid data. The means and standard deviations will be computed using the correct number of valid observations for each variable.

  1. The supported dataset types are CSV, XLS, XLSX, HDF5, FMT, DAT, DTA.

For HDF5 file, the dataset must include file schema and both file name and dataset name must be provided, e.g. dstat("h5://C:/gauss/examples/testdata.h5/mydata", formula)

See also

Formula String

Source

dstat.src