loadd¶
Purpose¶
Loads data from a dataset. The supported dataset types are CSV, Excel (xlsx, xlsx), HDF5, GAUSS Matrix (fmt), GAUSS Dataset (dat), Stata (dta), and SAS (sas7bdat, sas7bcat). Existing dataframes are also supported.
Format¶
-
y =
loadd
(dataset[, varnames, ld_ctl])¶ - Parameters:
dataset (string or existing dataframe) –
filepath to the dataset on disk, URL, or existing dataframe.
If the a URL is provided (with http or https schema), the dataset will be downloaded first. Since libcurl is used for all web operations, various proxy settings can be set using the relevant libcurl environment variables (see https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html).
varnames (string) –
Formula string indicating which variable names to load from the dataset
E.g
"."
, include all variables;E.g
"Income + Limit "
, include"Income"
and"Limit"
;E.g
". - Cards"
, include all variables except for"Cards"
.ld_ctl –
Optional input. instance of the
loadFileControl
structure containing the following members:ctl.header_row
scalar, number of header row which contains variable names. Default = 1.
ctl.row_range.first
scalar, first row to start loading data from. Default = row after header.
ctl.row_range.last
scalar, last row to load data from. Default = last row of file.
ctl.missing_vals_str
string array, specifies values that should be treated as missing upon import. Default =
"."
and" "
.ctl.clusterVar
string, name of cluster group variable. Only valid if dataset and formula is specified.
ctl.load_intercept
scalar, indicator to load an intercept column with the data. Default = 0.
ctl.expand_categories
scalar, indicator to expand categorical variables as dummy variables. Default = 0.
ctl.csv.delimiter
string, specifies the delimiter used in .csv files. Default =
","
.ctl.csv.quotechar
string, tells GAUSS which text should be treated as a quote mark. Default =
""
.ctl.xls.sheet
scalar, Excel sheet number to be loaded. Default = 1.
- Returns:
y (NxK matrix) – data.
Examples¶
Load all contents of a GAUSS dataset¶
// Create file name with full path
file = getGAUSShome() $+ "examples/credit.dat";
// Load all rows from all columns of the dataset
y = loadd(file);
// Print the first three rows of 'y'
print y[1:3, .];
After the above code, the following ouptut should be printed to the Command window.
14.8910 3606.00 283.000 2.00000 34.0000
106.025 6645.00 483.000 3.00000 82.0000
104.593 7075.00 514.000 4.00000 71.0000
Load specified variables from a dataset¶
// Load all variables with a formula string
dat1 = loadd(file, "." );
// Load all observations of 'Balance' and 'Limit'
dat2 = loadd(file, "Balance + Limit" );
// Load all variables EXCEPT for 'Cards'
dat3 = loadd(file, ". - Cards" );
// Print first three rows of each matrix
print "All variables: " dat1[1:3, .];
print "Balance and Limit: " dat2[1:3, .];
print "All except Cards: " dat3[1:3, .];
After the above code,
All variables:
14.891 3606.00 283.00 2.0000 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.03 6645.00 483.00 3.0000 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.59 7075.00 514.00 4.0000 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Balance and Limit:
333.000 3606.00
903.000 6645.00
580.000 7075.00
All except Cards:
14.8910 3606.00 283.00 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.025 6645.00 483.00 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.593 7075.00 514.00 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Load all columns of a GAUSS matrix file, .fmt¶
No variable names are stored in .fmt
files. GAUSS allows the use of X1, X2, X2...XP
to reference variables in a .fmt
file.
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load all columns of 'x.fmt'
x_2 = loadd("x.fmt");
Load specified columns of a GAUSS matrix file, .fmt.¶
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load columns 2 and 4 from 'x.fmt'
x_2 = loadd("x.fmt", "X2 + X4");
Load three specified variables from a SAS dataset, .sas7bdat.¶
new;
cls;
dataset = getGAUSSHome() $+ "examples/detroit.dta";
// Create formula string specifying three variables to load
formula = "homicide + unemployment + hourly_earn";
y = loadd(dataset, formula);
print "The dataset use is ";; dataset;
print "The number of variables equals: ";; cols(y);
print "The number of observations equals: ";; rows(y);
After the above code,
The dataset use is C:\gauss23\examples\detroit.dta
The number of variables equals: 3.0000000
The number of observations equals: 13.000000
Load a string date from a .csv file and automatically convert it to a POSIX date/time (seconds since Jan 1, 1970).¶
dataset = getGAUSSHome() $+ "examples/yellowstone.csv";
// Create formula string specifying that the column 'Date'
// from 'yellowstone.csv' is a string column (by using $) and
// that it should be loaded as a date with the 'date' keyword
formula = "date($Date)";
// Load the date and convert to POSIX date/time format
dt_pos = loadd(dataset, formula);
// Convert the first 5 dates to a string 'Month day, Year'
// and print them
print posixToStrc(dt_pos[1:5], "%B %d, %Y");
After the above code,
January 01, 2016
January 01, 2015
January 01, 2014
January 01, 2013
January 01, 2012
Use loadfileControl
structure for advanced loading options.¶
// Create file name with full path
dataset = getGAUSSHome("examples/housing.csv");
// Pass the loadFileControl structure as the final input
// Note the use of the '.' operator to note that all variables should be loaded
housing_full = loadd(dataset, ".");
The first five rows of housing_full
are:
taxes beds baths new price size
3104.0000 4.0000000 2.0000000 0.0000000 279.90000 2048.0000
1173.0000 2.0000000 1.0000000 0.0000000 146.50000 912.00000
3076.0000 4.0000000 2.0000000 0.0000000 237.70000 1654.0000
1608.0000 3.0000000 2.0000000 0.0000000 200.00000 2068.0000
1454.0000 3.0000000 3.0000000 0.0000000 159.90000 1477.0000
The last five rows of the housing_full
are:
taxes beds baths new price size
990.00000 2.0000000 2.0000000 0.0000000 176.00000 1060.0000
3030.0000 3.0000000 2.0000000 0.0000000 196.50000 1730.0000
1580.0000 3.0000000 2.0000000 0.0000000 132.20000 1370.0000
1770.0000 3.0000000 2.0000000 0.0000000 88.400000 1560.0000
1430.0000 3.0000000 2.0000000 0.0000000 127.20000 1340.0000
Now we will only load rows 9 through 21:
// 1. Declare ld_ctl to be an instance of a 'loadFileControl' structure
struct loadFileControl ld_ctl;
// 2. Fill 'ld_ctl' with default settings
ld_ctl = loadFileControlCreate();
// 3. Change the row range to load rows 9-21
ld_ctl.row_range.first = 9;
ld_ctl.row_range.last = 21;
// Pass the loadFileControl structure as the final input
// Note the use of the '.' operator to note that all variables should be loaded
housing_short = loadd(dataset, ".", ld_ctl);
After the above code, last five rows of the housing_short
are:
taxes beds baths new price size
3002.0000 3.0000000 2.0000000 1.0000000 289.90000 2075.0000
6627.0000 5.0000000 4.0000000 0.0000000 587.00000 3990.0000
320.00000 3.0000000 2.0000000 0.0000000 70.000000 1160.0000
630.00000 3.0000000 2.0000000 0.0000000 64.500000 1220.0000
1780.0000 3.0000000 2.0000000 0.0000000 167.00000 1690.0000
The last five rows of housing_short
are:
taxes beds baths new price size
590.00000 2.0000000 1.0000000 0.0000000 70.000000 770.00000
1050.0000 3.0000000 2.0000000 0.0000000 85.000000 1410.0000
20.000000 3.0000000 1.0000000 0.0000000 22.500000 1060.0000
870.00000 2.0000000 2.0000000 0.0000000 90.000000 1300.0000
1320.0000 3.0000000 2.0000000 0.0000000 133.00000 1500.0000
Remarks¶
Since
loadd()
will load the entire dataset at once, the dataset must be small enough to fit in memory. To read chunks of a dataset in an iterative manner, usedataopen()
andreadr()
.If dataset is a null string or 0, the dataset
temp.dat
will be loaded.To load a matrix file, use an
.fmt
extension on dataset.The supported dataset types are
CSV
,Excel
(XLS, XLSX),HDF5
,GAUSS Matrix (FMT)
,GAUSS Dataset (DAT)
,Stata
(DTA) andSAS
(SAS7BDAT, SAS7BCAT).For
HDF5
file, the dataset must include schema and both file name and dataset name must be provided, e.g.
loadd("h5://C:/gauss23/examples/testdata.h5/mydata").
Source¶
saveload.src
Globals¶
__maxvec
See also¶
See also