loadd#
Purpose#
Loads data from a dataset. The supported dataset types are CSV, Excel (xlsx, xlsx), HDF5, GAUSS Matrix (fmt), GAUSS Dataset (dat), Stata (dta), and SAS (sas7bdat, sas7bcat). Existing dataframes are also supported.
Format#
- y = loadd(dataset[, varnames, ldCtl])#
- Parameters:
dataset (String or existing dataframe) –
filepath to the dataset on disk, URL, or existing dataframe.
If the a URL is provided (with http or https schema), the dataset will be downloaded first. Since libcurl is used for all web operations, various proxy settings can be set using the relevant libcurl environment variables (see https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html).
varnames (String) –
Optional, formula string indicating which variable names to load from the dataset
E.g
"."
, include all variables;E.g
"Income + Limit "
, include"Income"
and"Limit"
;E.g
". - Cards"
, include all variables except for"Cards"
.ldctl (struct) –
Optional, instance of an
LoadFileControl
structure containing the following members:ldctl.header_row
scalar, Specifies the row location of the variable name headers. Data loading row range will default to begin at first row after headers. Default = 1.
ldctl.row_range.first
scalar, Specifies the first row to begin loading data from. Default = 2 (or first row after headers).
ldctl.row_range.last
scalar, Specifies the last row to stop loading data from. Default = -1 (last row).
ldctl.xls.sheet
scalar, Specifies the XLS sheet number to be loaded. Valid only for XLS, XLSX files. Default = 1.
ldctl.csv.delimiter
string, Specifies the CSV delimiter. Valid only for CSV files. Default =
","
.ldctl.csv.quotechar
string, Specifies the CSV quotation character. Valid only for CSV files. Default = Double quotes.
ldctl.missing_vals_str
string, Specifies how missing variables should be represented for string types. Default =
" "
.
- Returns:
y (NxK matrix) – data.
Examples#
Load all contents of a GAUSS dataset#
// Create file name with full path
file = getGAUSShome() $+ "examples/credit.dat";
// Load all rows from all columns of the dataset
y = loadd(file);
// Preview first 5 rows of the first 3 columns
head(y[., 1:3]);
After the above code, the following ouptut should be printed to the Command window.
Income Limit Rating
14.891000 3606.0000 283.00000
106.02500 6645.0000 483.00000
104.59300 7075.0000 514.00000
148.92400 9504.0000 681.00000
55.882000 4897.0000 357.00000
Load specified variables from a dataset#
// Load all variables with a formula string
dat1 = loadd(file, "." );
// Load all observations of 'Balance' and 'Limit'
dat2 = loadd(file, "Balance + Limit" );
// Load all variables EXCEPT for 'Cards'
dat3 = loadd(file, ". - Cards" );
// Print first three rows of each matrix
print "All variables: " dat1[1:3, .];
print "Balance and Limit: " dat2[1:3, .];
print "All except Cards: " dat3[1:3, .];
After the above code,
All variables:
14.891 3606.00 283.00 2.0000 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.03 6645.00 483.00 3.0000 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.59 7075.00 514.00 4.0000 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Balance and Limit:
333.000 3606.00
903.000 6645.00
580.000 7075.00
All except Cards:
14.8910 3606.00 283.00 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.025 6645.00 483.00 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.593 7075.00 514.00 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Load all columns of a GAUSS matrix file, .fmt#
No variable names are stored in .fmt
files. GAUSS allows the use of X1, X2, X2...XP
to reference variables in a .fmt
file.
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load all columns of 'x.fmt'
x_2 = loadd("x.fmt");
Load specified columns of a GAUSS matrix file, .fmt.#
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load columns 2 and 4 from 'x.fmt'
x_2 = loadd("x.fmt", "X2 + X4");
Load three specified variables from a SAS dataset, .sas7bdat.#
new;
cls;
dataset = getGAUSSHome("examples/detroit.dta");
// Create formula string specifying three variables to load
formula = "homicide + unemployment + hourly_earn";
y = loadd(dataset, formula);
print "The dataset use is ";; dataset;
print "The number of variables equals: ";; cols(y);
print "The number of observations equals: ";; rows(y);
After the above code,
The dataset use is C:\gauss23\examples\detroit.dta
The number of variables equals: 3.0000000
The number of observations equals: 13.000000
Loading different variable types#
The loadd()
procedure has built in capability to detect four variable types: strings, dates, categories, and numbers. For most cases, no additional information needs to be provided for GAUSS to determine the data types. First, consider loading dates:
// Specify dataset
dataset = getGAUSSHome("examples/yellowstone.csv");
// Load the data
data = loadd(dataset);
// Preview dates and visits
head(data[., "Date" "Visits"]);
After the above code,
Date Visits
2016/01/01 30621.000
2015/01/01 28091.000
2014/01/01 26778.000
2013/01/01 24699.000
2012/01/01 24766.000
Note that no additional keywords were needed to load the dates. The types that are loaded can be confirmed using getColTypes()
getColTypes(data, "Date"$|"Visits");
type
date
number
As a second example, consider loading categorical variables from the file yarn.xlsx. Again, no additional keywords are needed:
// Specify dataset
dataset = getGAUSSHome("examples/yarn.xlsx");
// Load the data
data = loadd(dataset);
// Preview data
head(data);
// Check variable types
getColTypes(data);
yarn_length amplitude load cycles
low low low 674.00000
low low med 370.00000
low low high 292.00000
low med low 338.00000
low med med 266.00000
type
category
category
category
number
If you are not certain of the default type that GAUSS will load, the GAUSS Data Import window will provide a preview.
Advanced data loading options#
For advanced data loading options, a loadFileControl
structure can be used. For example, consider modifying the row range that will be loaded:
// Create file name with full path
dataset = getGAUSSHome("examples/housing.csv");
// Declare ld_ctl to be an instance of a 'loadFileControl' structure
struct loadFileControl ld_ctl;
// Fill 'ld_ctl' with default settings
ld_ctl = loadFileControlCreate();
// Change the row range to load rows 9-21
ld_ctl.row_range.first = 9;
ld_ctl.row_range.last = 21;
// Pass the loadFileControl structure as the final input
// Note the use of the '.' operator to note that all variables should be loaded
housing = loadd(dataset, ".", ld_ctl);
Remarks#
Since
loadd()
will load the entire dataset at once, the dataset must be small enough to fit in memory. To read chunks of a dataset in an iterative manner, usedataopen()
andreadr()
.If dataset is a null string or 0, the dataset
temp.dat
will be loaded.To load a matrix file, use an
.fmt
extension on dataset.The supported dataset types are
CSV
,Excel
(XLS, XLSX),HDF5
,GAUSS Matrix (FMT)
,GAUSS Dataset (DAT)
,Stata
(DTA) andSAS
(SAS7BDAT, SAS7BCAT).For
HDF5
file, the dataset must include schema and both file name and dataset name must be provided, e.g.
loadd("h5://C:/gauss23/examples/testdata.h5/mydata").
Source#
saveload.src
Globals#
__maxvec
See also#
See also