gdaDStatMat#

Purpose#

Computes descriptive statistics on a selection of columns from a matrix located in a GAUSS Data Archive.

Format#

dout = gdaDStatMat(dc0, filename, gmat, colind, vnamevar)#
Parameters:
  • dc0 (struct) –

    an instance of a dstatmtControl structure with the following members:

    dc0.altnames

    Kx1 string array of alternate variable names for the output. Default = "". If set, it must have the same number of rows as colind.

    dc0.maxbytes

    scalar, the maximum number of bytes to be read per iteration of the read loop. Default = 1e9.

    dc0.maxvec

    scalar, the largest number of elements allowed in any one matrix. Default = 20000.

    dc0.miss

    scalar, one of the following:

    0:

    There are no missing values (fastest).

    1:

    Listwise deletion, drop a row if any missings occur in it.

    2:

    Pairwise deletion.

    Default = 0.

    dc0.output

    scalar, one of the following:

    0:

    Do not print output table.

    1:

    Print output table.

    Default = 1.

    dc0.row

    scalar, the number of rows of vnamevar to be read per iteration of the read loop. If 0, (default) the number of rows will be calculated using dc0.maxbytes and dc0.maxvec.

  • filename (string) – name of data file.

  • gmat (string or scalar) – name of matrix or index of matrix.

  • colind (Kx1 vector) – indices of columns in variable to use.

  • vnamevar (string or scalar) – name of the string containing the variable names in the matrix or index of the string containing the variable names in the matrix.

Returns:

dout (struct) –

instance of dstatmtOut struct with the following members:

dout.vnames

Kx1 string array, the names of the variables used in the statistics.

dout.mean

Kx1 vector, means.

dout.var

Kx1 vector, variance.

dout.std

Kx1 vector, standard deviation.

dout.min

Kx1 vector, minima.

dout.max

Kx1 vector, maxima.

dout.valid

Kx1 vector, the number of valid cases.

dout.missing

Kx1 vector, the number of missing cases.

dout.errcode

scalar, error code, 0 if successful, otherwise one of the following:

1:

No GDA indicated.

3:

Variable must be Nx1.

4:

Not implemented for complex data.

5:

Variable must be type matrix.

7:

Too many missings, no data left after packing.

9:

altnames member of dstatmtControl structure wrong size.

11:

Data read error.

Examples#

In order to create a real, working example that you can use, you must first create a sample GAUSS Data Archive with the code below.

// Create an example GAUSS Data Archive
ret = gdaCreate("myfile.gda", 1);

// Add a variable 'A' which is a 10x5 random normal matrix
ret = gdaWrite("myfile.gda", rndn(10, 5), "A");

// Add a variable 'COLS' which is a 5x1 string array
string vnames = { "X1", "X2", "X3", "X4", "X5" };
ret = gdaWrite("myfile.gda", vnames, "COLS");

This code above will create a GAUSS Data Archive containing two variables, the GAUSS matrix A containing the data and COLS which contains the names for the columns of the matrix A which are the model variables (X1, X2,...).

The code below computes the statistics on each of the columns of the matrix A.

/*
** Declare instance of the
** dstatmtControl structure
*/
struct dstatmtControl dc0;
dc0 = dstatmtControlCreate;

// Indices of variables to evaluate
colind = { 1, 2, 3, 4, 5 };

// Declare output structure
struct dstatmtout dout;
dout = gdaDStatMat(dc0, "myfile.gda", "A", colind, "COLS" );

The final input to gdaDStatMat above tells the function the names to use for the columns of A. In this example, you can reference the COLS variable by name as you see in the example below. Alternatively, you can access this variable by index. Since COLS is the second variable in the GAUSS Data Archive created at the start of this example, the following is equivalent to the last line above:

dout = gdaDStatMat(dc0, "myfile.gda", "A", colind, 2 );

If you wanted to calculate the statistics on just the first, third and fifth columns of A:

colind = { 1, 3, 5 };
dout = gdaDStatMat(dc0, "myfile.gda", "A", colind, "COLS" );

Notice in these lines above that COLS still contains all of the variable names i.e. X1, X2, X3, X4, X5. COLS should always contain the full list of all variables in the matrix A.

Remarks#

Set colind to a scalar 0 to use all of the columns in vnamevar.

vnamevar must either reference an Mx1 string array variable containing variable names, where M is the number of columns in the dataset variable, or be set to a scalar 0. If vnamevar references an Mx1 string array variable, then only the elements indicated by colind will be used. Otherwise, if vnamevar is set to a scalar 0, then the variable names "X1, X2, ..., XK" for the output will be generated automatically, unless the alternate variable names are set explicitly in the dc0.altnames member of the dstatmtControl structure.

If pairwise deletion is used, the minima and maxima will be the true values for the valid data. The means and standard deviations will be computed using the correct number of valid observations for each variable.

Source#

gdadstat.src

See also

Functions gdaDStat(), dstatmtControlCreate()