loadd¶
Purpose¶
Loads data from a dataset. The supported dataset types are CSV, Excel (xlsx, xlsx), HDF5, GAUSS Matrix (fmt), GAUSS Dataset (dat), Stata (dta), and SAS (sas7bdat, sas7bcat). Existing dataframes are also supported.
Format¶
-
y =
loadd
(dataset[, varnames])¶ - Parameters
dataset (string or existing dataframe) –
filepath to the dataset on disk, URL, or existing dataframe.
If the a URL is provided (with http or https schema), the dataset will be downloaded first. Since libcurl is used for all web operations, various proxy settings can be set using the relevant libcurl environment variables (see https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html).
varnames (string) –
Formula string indicating which variable names to load from the dataset
E.g
"."
, include all variables;E.g
"Income + Limit "
, include"Income"
and"Limit"
;E.g
". - Cards"
, include all variables except for"Cards"
.
- Returns
y (NxK matrix) – data.
Examples¶
Load all contents of a GAUSS dataset¶
// Create file name with full path
file = getGAUSShome() $+ "examples/credit.dat";
// Load all rows from all columns of the dataset
y = loadd(file);
// Print the first three rows of 'y'
print y[1:3, .];
After the above code, the following ouptut should be printed to the Command window.
14.8910 3606.00 283.000 2.00000 34.0000
106.025 6645.00 483.000 3.00000 82.0000
104.593 7075.00 514.000 4.00000 71.0000
Load specified variables from a dataset¶
// Load all variables with a formula string
dat1 = loadd(file, "." );
// Load all observations of 'Balance' and 'Limit'
dat2 = loadd(file, "Balance + Limit" );
// Load all variables EXCEPT for 'Cards'
dat3 = loadd(file, ". - Cards" );
// Print first three rows of each matrix
print "All variables: " dat1[1:3, .];
print "Balance and Limit: " dat2[1:3, .];
print "All except Cards: " dat3[1:3, .];
After the above code,
All variables:
14.891 3606.00 283.00 2.0000 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.03 6645.00 483.00 3.0000 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.59 7075.00 514.00 4.0000 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Balance and Limit:
333.000 3606.00
903.000 6645.00
580.000 7075.00
All except Cards:
14.8910 3606.00 283.00 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000
106.025 6645.00 483.00 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000
104.593 7075.00 514.00 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Load all columns of a GAUSS matrix file, .fmt¶
No variable names are stored in .fmt
files. GAUSS allows the use of X1, X2, X2...XP
to reference variables in a .fmt
file.
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load all columns of 'x.fmt'
x_2 = loadd("x.fmt");
Load specified columns of a GAUSS matrix file, .fmt.¶
// Create a matrix
x = rndn(10, 4);
// Save to a matrix file, 'x.fmt'
save x;
// Load columns 2 and 4 from 'x.fmt'
x_2 = loadd("x.fmt", "X2 + X4");
Load three specified variables from a SAS dataset, .sas7bdat.¶
new;
cls;
dataset = getGAUSSHome() $+ "examples/detroit.dta";
// Create formula string specifying three variables to load
formula = "homicide + unemployment + hourly_earn";
y = loadd(dataset, formula);
print "The dataset use is ";; dataset;
print "The number of variables equals: ";; cols(y);
print "The number of observations equals: ";; rows(y);
After the above code,
The dataset use is C:\gauss23\examples\detroit.dta
The number of variables equals: 3.0000000
The number of observations equals: 13.000000
Load a string date from a .csv file and automatically convert it to a POSIX date/time (seconds since Jan 1, 1970).¶
dataset = getGAUSSHome() $+ "examples/yellowstone.csv";
// Create formula string specifying that the column 'Date'
// from 'yellowstone.csv' is a string column (by using $) and
// that it should be loaded as a date with the 'date' keyword
formula = "date($Date)";
// Load the date and convert to POSIX date/time format
dt_pos = loadd(dataset, formula);
// Convert the first 5 dates to a string 'Month day, Year'
// and print them
print posixToStrc(dt_pos[1:5], "%B %d, %Y");
After the above code,
January 01, 2016
January 01, 2015
January 01, 2014
January 01, 2013
January 01, 2012
Remarks¶
Since
loadd()
will load the entire dataset at once, the dataset must be small enough to fit in memory. To read chunks of a dataset in an iterative manner, usedataopen()
andreadr()
.If dataset is a null string or 0, the dataset
temp.dat
will be loaded.To load a matrix file, use an
.fmt
extension on dataset.The supported dataset types are
CSV
,Excel
(XLS, XLSX),HDF5
,GAUSS Matrix (FMT)
,GAUSS Dataset (DAT)
,Stata
(DTA) andSAS
(SAS7BDAT, SAS7BCAT).For
HDF5
file, the dataset must include schema and both file name and dataset name must be provided, e.g.
loadd("h5://C:/gauss23/examples/testdata.h5/mydata").
Source¶
saveload.src
Globals¶
__maxvec
See also¶
See also