selif#
Purpose#
Selects rows from a matrix, dataframe or string array, based upon a vector of 1’s and 0’s.
Format#
- y = selif(x, e)#
- Parameters:
x (NxK matrix, dataframe or string array) – data
e (vector) – Nx1 vector of 1’s and 0’s
- Returns:
y (MxK matrix, dataframe or string array) – consists of the rows of x for which there is a 1 in the corresponding row of e.
Examples#
Categorical dataframe examples#
This example selects all rows where the category label of the amplitude variable is equal to “med”.
// Get file name with full path
dataset = getGAUSSHome("examples/yarn.xlsx");
// Load 'cycles' as numeric variable and load
// 'amplitude' as a categorical variable
yarn = loadd(dataset, "cycles + cat(amplitude)");
// Print all variables from the first 5 rows
print yarn[1:5,.];
The above code will print the first five rows of the data before filtering with selif()
.
cycles amplitude
674.00000 low
370.00000 low
292.00000 low
338.00000 med
266.00000 med
// Select all rows of 'yarn' where the 'amplitude'
// variable is equal to "med"
yarn_med = selif(yarn, yarn[.,"amplitude"] .== "med");
// Print all variables from the first 5 rows
print yarn_med[1:5,.];
The first five rows of the filtered data look like this:
cycles amplitude
338.00000 med
266.00000 med
210.00000 med
1022.0000 med
620.00000 med
Dataframe example with dates#
// Get file name with full path
dataset = getGAUSSHome("examples/xle_daily.xlsx");
// Load 'Date' as a date variable and load
// 'Adj Close' and 'Volume' as numerical variables
xle = loadd(dataset, "date(Date) + Adj Close + Volume");
// Print descriptive statistics from the original data
call dstatmt(xle);
Note the range of the Date variable in the descriptive statistics below is from 2017 to 2018.
--------------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
--------------------------------------------------------------------------------------
Date ----- ----- ----- 2017-06-13 2018-06-13 253 0
Adj Close 68.44 4.79 22.94 60.36 78.91 253 0
Volume 1.431e+07 5.091e+06 2.592e+13 4.991e+06 3.517e+07 253 0
// Select all rows of 'xle' where the 'Date' variable
// is greater than or equal to July 1st, 2017 and less than
// August 1st, 2017
mask = xle[.,"Date"] .>= "2017-07-01" .and xle[.,"Date"] .< "2017-08-01";
xle_july_2017 = selif(xle, mask);
// Print descriptive statistics of the filtered data
call dstatmt(xle_july_2017);
After the call to selif()
, the range of our new data is confined to July of 2017.
------------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
------------------------------------------------------------------------------------
Date ----- ----- ----- 2017-07-03 2017-07-31 20 0
Adj Close 63.6 0.7824 0.6122 62.32 64.86 20 0
Volume 1.554e+07 4.02e+06 1.616e+13 1.006e+07 2.48e+07 20 0
Numerical matrix examples#
This example selects all rows of x in which the second column is greater than 100. This will set y equal to:
x = { 112 252,
99 119,
109 81,
184 111,
209 94 };
y = selif(x, x[., 2] .> 100);
This example selects all rows of x in which the second column is greater than 100. This will set y equal to:
112 252
99 119
184 111
x = { 0 10 20,
30 40 50,
60 70 80 };
e = (x[., 1] .> 0) .and (x[., 3] .< 100);
y = selif(x, e);
The resulting matrix y is:
30 40 50
60 70 80
All rows for which the element in column 1 is greater than 0 and the element in column 3 is less than 100 are placed into the matrix y.
Remarks#
The argument e will usually be generated by a logical expression using “dot” operators.
y will be a scalar missing if no rows are selected.
See also
Functions delif()
, scalmiss()