startsWith

Purpose

Returns a 1 if a string starts with a specified pattern.

Format

mask = startsWith(str, pat)
Parameters:
  • str (Nx1 string array or dataframe of type category or string) – The data to be searched.

  • pat (String or dataframe of type category or string) – The pattern to search for in the beginning of str.

Returns:

mask (Nx1 vector) – A matrix of the same size as str with a 1 in any element that starts with the value of pat, otherwise 0.

Examples

Example 1

The following example searches for all observations of the variable make in the auto2.dta dataset that starts with "Buick" .

// Load 3 variables from the dataset
fname = getGAUSSHome("examples/auto2.dta");
auto = loadd(fname, "make + price + mpg");

// Specify pattern to search for
pat = "Buick";

// Find all makes that include 'Buick'
mask = startsWith(auto[., "make"], pat);

// Select observations if the corresponding
// row of mask equals 1.
auto_buicks = selif(auto, mask);

print auto_buicks;

This prints the following:

         make            price              mpg
Buick Century        4816.0000        20.000000
Buick Electra        7827.0000        15.000000
Buick LeSabre        5788.0000        18.000000
   Buick Opel        4453.0000        26.000000
  Buick Regal        5189.0000        20.000000
Buick Riviera        10372.000        16.000000
Buick Skylark        4082.0000        19.000000

Example 2: Select rows based on the starting text from 2 columns

In this example, we will select all rows where the first columns starts with Buick and the second column starts with Ave.

// Load 2 variables from the dataset
fname = getGAUSSHome("examples/auto2.dta");
auto = loadd(fname, "make + rep78");

// Select the first 7 observations
auto = auto[1:7,.];

print auto;
         make            rep78
  AMC Concord          Average
    AMC Pacer          Average
   AMC Spirit                .
Buick Century          Average
Buick Electra             Good
Buick LeSabre          Average
   Buick Opel                .

This time our pattern input needs to be a 1x2 string array with one search pattern for each column.

// Specify one string to search
// for in each column
pat = "Buick" $~ "Ave";

// Find all makes that include 'Buick'
// and all rep78's that include 'Ave'.
mask = startsWith(auto, pat);

print mask;
0.0000000        1.0000000
0.0000000        1.0000000
0.0000000        0.0000000
1.0000000        1.0000000
1.0000000        0.0000000
1.0000000        1.0000000
1.0000000        0.0000000

As we can see above, our mask contains two columns that tell us which observations matched our search. Before we can use selif() to select the matching rows, we need to convert mask to a column vector with a 1 in the case where both columns matched. We will do that by summing across the rows and then using the dot equality operator to see which rows were summed to equal two.

mask2 = sumr(mask) .== 2;

// Seliect 'Buick' observations
// that are in average condition
avg_buicks = selif(auto, mask2);

print avg_buicks;
         make            rep78
Buick Century          Average
Buick LeSabre          Average

See also

Functions strindx(), strsect(), strtrim()