loaddSA#

Purpose#

Loads data from a dataset into a string array. The supported dataset types are CSV, Excel (XLS, XLSX), HDF5, GAUSS Matrix (FMT), GAUSS Dataset (DAT), Stata (DTA) and SAS (SAS7BDAT, SAS7BCAT).

Format#

y = loaddSA(dataset[, varnames])#
Parameters:
  • dataset (string) – name of dataset.

  • varnames (string) –

    Formula string indicating which variable names to load from the dataset

    E.g ".", include all variables;

    E.g "Income + Limit ", include "Income" and "Limit";

    E.g ". - Cards", - means exclude "Cards".

Returns:

y (NxK string array) – of data.

Examples#

Load single string column from STATA dataset#

The variable make in the example dataset auto2.dta is a string variable containing the manufacturer and model of each car in the dataset. This example will load that variable into a GAUSS string array and print the first five observations.

// Create file name with full path
fname = getGAUSSHome("examples/auto2.dta");

// Load all rows from the variable 'make'
// in 'auto2.dta' into a string array
mk = loaddSA(fname, "make");

// Print the first five rows of 'mk'
print mk[1:5];

The above code will print the following ouptut:

  AMC Concord
    AMC Pacer
   AMC Spirit
Buick Century
Buick Electra

Load string and numeric variables from GAUSS dataset#

// Create file name with full path
fname = getGAUSSHome("examples/hsng.dat");

// Load all rows from the variables 'state' and 'popgrow'
x = loaddSA(fname, "state + popgrow");

// Print the first three rows of 'x'
print x[1:3,.];

The above code will print the following output:

AL     13.1
AK     32.8
AZ     53.1

Load and pre-process a variable from an Excel file#

You can use your own procedures to pre-process data loaded using formula string notation. This example creates a procedure to expand basketball position abbreviations into the full position name before returning the data.

new;
cls;

// Create procedure to expand basketball position
// abbreviations: C, F, G to Center, Forward, Guard
proc (1) = expandPos(s);
    local from, to;
    from = "C" $| "F" $| "G";
    to = "Center" $| "Forward" $| "Guard";
    retp(reclassify(s, from, to));
endp;

// Create file name with full path
dataset = getGAUSSHome("examples/nba_ht_wt.xls");

// Load three variables and use the 'expandPos' proc
// to pre-process the 'pos' variable before returning it
plr = loaddSA(dataset, "player + expandPos(pos) + height");

// Print the first four rows of the string array
print plr[1:4,.];

After the above code,

 Vitor Faverani     Center    83
  Avery Bradley      Guard    74
   Keith Bogans      Guard    77
Jared Sullinger    Forward    81

Remarks#

  • loaddSA() is the same as loadd(), except that it:

    • returns a string array.

    • only supports the formula string operators + and -.

  • Since loaddSA() will load the entire dataset at once, the dataset must be small enough to fit in memory.

  • If dataset is a null string or 0, the dataset temp.dat will be loaded.

  • The supported dataset types are CSV, Excel (XLS, XLSX), HDF5, GAUSS Matrix (FMT), GAUSS Dataset (DAT), Stata (DTA) and SAS (SAS7BDAT, SAS7BCAT).

  • Since GAUSS Matrix files (FMT) do not contain data type information, loaddSA() will assume that the entire contents of the file are numeric.

  • For HDF5 file, the dataset must include schema and both file name and dataset name must be provided, e.g.

loaddSA("h5://C:/gauss23/examples/testdata.h5/mydata").

Source#

saveload.src

See also#

See also

Formula String, csvReadSA(), getHeaders(), loadd(), saved()