dstat#
Purpose#
Computes descriptive statistics.
Note
This function is deprecated, use dstatmt()
instead.
Format#
- { vnam, mean, var, std, min, max, valid, mis } = dstat(dataset, vars)#
- Parameters:
dataset (string) – name of dataset. If dataset is null or 0, vars will be assumed to be a matrix containing the data.
vars (string or string array) –
the variables.
If dataset contains the name of a dataset, vars will be interpreted as either:
A Kx1 character vector containing the names of variables.
A Kx1 numeric vector containing indices of variables.
A
formula string
. e.g."PAY + WT"
or". - sex"
.
These can be any size subset of the variables in the dataset and can be in any order. If a scalar 0 is passed, all columns of the dataset will be used.
If dataset is null or 0, vars will be interpreted as a NxK matrix, the data on which to compute the descriptive statistics.
- Returns:
vnam (Kx1 character vector) – the names of the variables used in the statistics.
mean (Kx1 vector) – means.
var (Kx1 vector) – variance.
std (Kx1 vector) – standard deviation.
min (Kx1 vector) – minima.
max (Kx1 vector) – maxima.
valid (Kx1 vector) – the number of valid cases.
mis (Kx1 vector) – the number of missing cases.
Global Input#
- __altnam#
matrix, default 0.
This can be a Kx1 character vector of alternate variable names for the output.
- __maxbytes#
scalar, the maximum number of bytes to be read per iteration of the read loop. Default = 1e9.
- __maxvec#
scalar, the largest number of elements allowed in any one matrix. Default = 20000.
- __miss#
scalar, default 0.
0
there are no missing values (fastest).
1
listwise deletion, drop a row if any missings occur in it.
2
pairwise deletion.
- __row#
scalar, the number of rows to read per iteration of the read loop.
if 0, (default) the number of rows will be calculated using
__maxbytes
and__maxvec
.
- __output#
scalar, controls output, default 1.
1
print output table.
0
do not print output.
Examples#
Example 1#
// Calculate statistics on all variables in dataset
file = getGAUSShome() $+ "examples/freqdata.dat";
// Calculate statistics on all variables in dataset: AGE, PAY, sex and WT
vars = 0;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);
After the above code,
-------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
-------------------------------------------------------------------------------
AGE ----- ----- ----- 1.0000 10.0000 400 0
PAY 1.9675 0.8019 0.6431 1.0000 3.0000 400 0
sex ----- ----- ----- ----- ----- 400 0
WT 1.4699 0.3007 0.0904 1.0000 1.9900 400 0
Example 2#
file = getGAUSShome() $+ "examples/freqdata.dat";
// Calculate statistics on just AGE and PAY
vars = "AGE" $| "PAY";
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);
After the above code,
-------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
-------------------------------------------------------------------------------
AGE ----- ----- ----- 1.0000 10.0000 400 0
PAY 1.9675 0.8019 0.6431 1.0000 3.0000 400 0
Example 3#
file = getGAUSShome() $+ "examples/freqdata.dat";
// Calculate statistics on just AGE and PAY using numerical indices
vars = { 1, 2 };
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);
After the above code,
------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
------------------------------------------------------------------------------
AGE ----- ----- ----- 1.0000 10.0000 400 0
PAY 1.9675 0.8019 0.6431 1.0000 3.0000 400 0
Example 4#
file = getGAUSShome() $+ "examples/freqdata.dat";
// Calculate statistics on just AGE and PAY using __miss
vars = { 1, 2 };
// Drop rows with missing values
__miss = 1;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);
After the above code,
------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
------------------------------------------------------------------------------
AGE 5.6784 2.9932 8.9593 1.0000 10.0000 398 2
PAY 1.9623 0.8006 0.6409 1.0000 3.0000 398 2
Example 5#
/*
** Calculate statistics using formula string and __miss
** Set up a formula string with all variables exclude "sex"
*/
vars = ". - sex";
// Drop rows with missing values
__miss = 1;
{ vnam, mean, var, std, min, max, valid, mis } = dstat(file, vars);
After the above code,
-----------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
-----------------------------------------------------------------------------
AGE 5.6784 2.9932 8.9593 1.0000 10.0000 398 2
PAY 1.9623 0.8006 0.6409 1.0000 3.0000 398 2
WT 1.4713 0.3009 0.0906 1.0000 1.9900 398 2
Example 6#
Descriptive statistics on a matrix.
data = { 1 2, 3 4, 5 6, 7 8 };
call dstat("", data);
After the above code,
-----------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
-----------------------------------------------------------------------------
X1 4 2.582 6.667 1 7 4 0
X2 5 2.582 6.667 2 8 4 0
Example 7#
Specify variable names.
// Note the use of the matrix concatenation operator, '|'
// instead of the string concatenation operator, `$|`,
// makes this a 2x1 character vector
__altnam = "ALPHA" | "BETA";
data = { 1 2, 3 4, 5 6, 7 8 };
call dstat("", data);
After the above code,
-----------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
-----------------------------------------------------------------------------
ALPHA 4 2.582 6.667 1 7 4 0
BETA 5 2.582 6.667 2 8 4 0
Remarks#
1. If pairwise deletion is used, the minima and maxima will be the true values for the valid data. The means and standard deviations will be computed using the correct number of valid observations for each variable.
The supported dataset types are
CSV
,XLS
,XLSX
,HDF5
,FMT
,DAT
,DTA
.
For HDF5 file, the dataset must include file schema
and both file name and dataset name must be provided, e.g.
dstat("h5://C:/gauss/examples/testdata.h5/mydata", formula)
See also
Formula String
Source#
dstat.src