strsplit#

Purpose#

Splits a string into individual tokens.

Format#

sa = strsplit(str[, sep])#
Parameters:
  • str (string or Nx1 string array) – data to be split.

  • sep (string) – Optional argument, containing the character used to separate the input string into individual tokens.

Returns:

sa (1xK or NxK string array) – original string split into individual tokens.

Examples#

Dates#

dt = "1977/04/03";
dt_split = strsplit(dt, "/");

After the code above, dt_split will be a 1x3 string array with the following contents:

"1977"    "04"    "03"

Comma-separated list of variables#

vars = "CPI,PPI,Employment,Oil:Brent blend,Oil:WTI";
vars = strsplit(vars, ",");

After the code above, vars will be a 1x5 string array with the following contents:

"CPI"    "PPI"    "Employment"    "Oil:Brent blend"    "Oil:WTI"

String array with supplied separator#

// Create a 3x1 string array
string dow_str = { "apple:technology",
                   "goldman sachs:finance",
                   "home depot:retail" };

// Split 'dow_str' into a 3x2 string array
dow_sa = strsplit(dow_str, ":");

The above code sets dow_sa to be equal to:

"apple"           "technology"
"goldman sachs"   "finance"
"home depot"      "retail"

String array without supplied separator#

Elements that contain spaces may be grouped with single ticks, like this:

ss = "classification 'scientific taxonomy'";
ss2 = strsplit(ss);

print "ss2[1] = " ss2[1];
print "ss2[2] = " ss2[2];

In this program, 'scientific taxonomy' is kept as one token, and thus the output from the above code is:

ss2[1] = classification
ss2[2] = scientific taxonomy

String array with multi-character delimiter#

ss = "h5://example.h5";
ss2 = strsplit(ss, "://");

print "ss2[1] = " ss2[1];
print "ss2[2] = " ss2[2];

The output from the above code is:

ss2[1] = h5
ss2[2] = example.h5

Remarks#

Case 1: No supplied separator#

If strsplit() is called with only one input (i.e. a separator is not passed in as the second argument), each of the following characters are considered delimiters:

space

ASCII 32

tab

ASCII 9

comma

ASCII 44

newline

ASCII 10

carriage return

ASCII 13

The input string will be split at each occurrence of ANY of the separators listed in the table above. For example:

sa = "alpha 1,beta 2,gamma 3";
strsplit(s);

will return a 1x6 string array with the following contents:

"alpha"    "1"       "beta"    "2"    "gamma"    "3"

Tokens containing delimiters must be enclosed in single or double quotes or parentheses. Tokens enclosed in single or double quotes will NOT retain the quotes upon translation. Tokens enclosed in parentheses WILL retain the parentheses after translation. Parentheses cannot be nested.

Case 2: Supplied separator#

If a separator is passed to strsplit(), the input string will be split into individual tokens at each instance of the specified separator. Only the supplied separator will be used to separate the tokens. Separators may only be 1 character. Any remaining white-space will be preserved. For example:

strsplit("alpha 1,beta 2,gamma 3", ",");

will return a 1x3 string array with the following contents:

"alpha 1"    "beta 2"   "gamma 3"

Rows with fewer tokens will be padded on the right. For example:

string s  = { "1982-04-19", "1994-06" };
strsplit(s, "-");

will return:

"1982"    "04"       "19"
"1994"    "06"         ""

See also

Functions strsplitPad()