selif

Purpose

Selects rows from a matrix, dataframe or string array, based upon a vector of 1’s and 0’s.

Format

y = selif(x, e)
Parameters
  • x (NxK matrix, dataframe or string array) – data

  • e (vector) – Nx1 vector of 1’s and 0’s

Returns

y (MxK matrix, dataframe or string array) – consists of the rows of x for which there is a 1 in the corresponding row of e.

Examples

Categorical dataframe examples

This example selects all rows where the category label of the amplitude variable is equal to “med”.

// Get file name with full path
dataset = getGAUSSHome() $+ "examples/yarn.xlsx";

// Load 'cycles' as numeric variable and load
// 'amplitude' as a categorical variable
yarn = loadd(dataset, "cycles + cat(amplitude)");

// Print all variables from the first 5 rows
print yarn[1:5,.];

The above code will print the first five rows of the data before filtering with selif().

   cycles        amplitude
674.00000              low
370.00000              low
292.00000              low
338.00000              med
266.00000              med
// Select all rows of 'yarn' where the 'amplitude'
// variable is equal to "med"
yarn_med = selif(yarn, yarn[.,"amplitude"] .== "med");

// Print all variables from the first 5 rows
print yarn_med[1:5,.];

The first five rows of the filtered data look like this:

   cycles        amplitude
338.00000              med
266.00000              med
210.00000              med
1022.0000              med
620.00000              med

Dataframe example with dates

// Get file name with full path
dataset = getGAUSSHome() $+ "examples/xle_daily.xlsx";

// Load 'Date' as a date variable and load
// 'Adj Close' and 'Volume' as numerical variables
xle = loadd(dataset, "date(Date) + Adj Close + Volume");

// Print descriptive statistics from the original data
call dstatmt(xle);

Note the range of the Date variable in the descriptive statistics below is from 2017 to 2018.

--------------------------------------------------------------------------------------
Variable         Mean     Std Dev    Variance     Minimum     Maximum   Valid  Missing
--------------------------------------------------------------------------------------

Date            -----       -----       -----  2017-06-13  2018-06-13     253       0
Adj Close       68.44        4.79       22.94       60.36       78.91     253       0
Volume      1.431e+07   5.091e+06   2.592e+13   4.991e+06   3.517e+07     253       0
// Select all rows of 'xle' where the 'Date' variable
// is greater than or equal to July 1st, 2017 and less than
// August 1st, 2017
mask = xle[.,"Date"] .>= "2017-07-01" .and xle[.,"Date"] .< "2017-08-01";
xle_july_2017 = selif(xle, mask);

// Print descriptive statistics of the filtered data
call dstatmt(xle_july_2017);

After the call to selif(), the range of our new data is confined to July of 2017.

------------------------------------------------------------------------------------
Variable        Mean    Std Dev    Variance     Minimum     Maximum   Valid  Missing
------------------------------------------------------------------------------------

Date           -----      -----       -----  2017-07-03  2017-07-31      20        0
Adj Close       63.6     0.7824      0.6122       62.32       64.86      20        0
Volume     1.554e+07   4.02e+06   1.616e+13   1.006e+07    2.48e+07      20        0

Numerical matrix examples

This example selects all rows of x in which the second column is greater than 100. This will set y equal to:

x = { 112 252,
       99 119,
      109  81,
      184 111,
      209  94 };

 y = selif(x, x[., 2] .> 100);

This example selects all rows of x in which the second column is greater than 100. This will set y equal to:

112    252
 99    119
184    111
x = { 0 10 20,
     30 40 50,
     60 70 80 };

e = (x[., 1] .> 0) .and (x[., 3] .< 100);
y = selif(x, e);

The resulting matrix y is:

30 40 50
60 70 80

All rows for which the element in column 1 is greater than 0 and the element in column 3 is less than 100 are placed into the matrix y.

Remarks

The argument e will usually be generated by a logical expression using “dot” operators.

y will be a scalar missing if no rows are selected.

See also

Functions delif(), scalmiss()