# selif¶

## Purpose¶

Selects rows from a matrix, dataframe or string array, based upon a vector of 1’s and 0’s.

## Format¶

y = selif(x, e)
Parameters
• x (NxK matrix, dataframe or string array) – data

• e (vector) – Nx1 vector of 1’s and 0’s

Returns

y (MxK matrix, dataframe or string array) – consists of the rows of x for which there is a 1 in the corresponding row of e.

## Examples¶

### Categorical dataframe examples¶

This example selects all rows where the category label of the amplitude variable is equal to “med”.

// Get file name with full path
dataset = getGAUSSHome() $+ "examples/yarn.xlsx"; // Load 'cycles' as numeric variable and load // 'amplitude' as a categorical variable yarn = loadd(dataset, "cycles + cat(amplitude)"); // Print all variables from the first 5 rows print yarn[1:5,.];  The above code will print the first five rows of the data before filtering with selif().  cycles amplitude 674.00000 low 370.00000 low 292.00000 low 338.00000 med 266.00000 med  // Select all rows of 'yarn' where the 'amplitude' // variable is equal to "med" yarn_med = selif(yarn, yarn[.,"amplitude"] .== "med"); // Print all variables from the first 5 rows print yarn_med[1:5,.];  The first five rows of the filtered data look like this:  cycles amplitude 338.00000 med 266.00000 med 210.00000 med 1022.0000 med 620.00000 med  ### Dataframe example with dates¶ // Get file name with full path dataset = getGAUSSHome()$+ "examples/xle_daily.xlsx";

// Load 'Date' as a date variable and load
// 'Adj Close' and 'Volume' as numerical variables
xle = loadd(dataset, "date(Date) + Adj Close + Volume");

// Print descriptive statistics from the original data
call dstatmt(xle);


Note the range of the Date variable in the descriptive statistics below is from 2017 to 2018.

--------------------------------------------------------------------------------------
Variable         Mean     Std Dev    Variance     Minimum     Maximum   Valid  Missing
--------------------------------------------------------------------------------------

Date            -----       -----       -----  2017-06-13  2018-06-13     253       0
Adj Close       68.44        4.79       22.94       60.36       78.91     253       0
Volume      1.431e+07   5.091e+06   2.592e+13   4.991e+06   3.517e+07     253       0

// Select all rows of 'xle' where the 'Date' variable
// is greater than or equal to July 1st, 2017 and less than
// August 1st, 2017
mask = xle[.,"Date"] .>= "2017-07-01" .and xle[.,"Date"] .< "2017-08-01";
xle_july_2017 = selif(xle, mask);

// Print descriptive statistics of the filtered data
call dstatmt(xle_july_2017);


After the call to selif(), the range of our new data is confined to July of 2017.

------------------------------------------------------------------------------------
Variable        Mean    Std Dev    Variance     Minimum     Maximum   Valid  Missing
------------------------------------------------------------------------------------

Date           -----      -----       -----  2017-07-03  2017-07-31      20        0
Adj Close       63.6     0.7824      0.6122       62.32       64.86      20        0
Volume     1.554e+07   4.02e+06   1.616e+13   1.006e+07    2.48e+07      20        0


### Numerical matrix examples¶

This example selects all rows of x in which the second column is greater than 100. This will set y equal to:

x = { 112 252,
99 119,
109  81,
184 111,
209  94 };

y = selif(x, x[., 2] .> 100);


112    252
99    119
184    111

x = { 0 10 20,
30 40 50,
60 70 80 };

e = (x[., 1] .> 0) .and (x[., 3] .< 100);
y = selif(x, e);


The resulting matrix y is:

30 40 50
60 70 80


All rows for which the element in column 1 is greater than 0 and the element in column 3 is less than 100 are placed into the matrix y.

## Remarks¶

The argument e will usually be generated by a logical expression using “dot” operators.

y will be a scalar missing if no rows are selected.