outerJoin

Purpose

Performs a left outer join on two matrices based upon user-specified key columns.

Format

C = outerJoin(A, ca, B, cb[, type])
Parameters:
  • A (matrix or dataframe) – data to join

  • ca (scalar, vector or string) – key column indices or names from A

  • B (matrix or dataframe) – matrix to join with A

  • cb (scalar, vector or string) – key column indices or names from B

  • type (string) –

    Optional argument, the type of join to perform. Default = "left". Options are:

    ”left”

    Left outer join. Result is A merged with matched data from B. Unmatched rows contains missing values for columns of B.

    ”full”

    Full outer join. Result is A merged with B. Unmatched rows from either side contain missing values for columns of the opposing symbol.

Returns:

C (matrix or dataframe) – result of join of A and B

Examples

Basic example

A = { 1 1.1,
      2 2.2,
      3 3.3 };

B = { 1  9,
      3 27,
      8 15 };

// Perform left outer-join, using the first
// columns of 'A' and 'B' as key columns
C = outerJoin(A, 1, B, 1);

After the code above, C equals:

1.0000000        1.1000000        9.0000000
2.0000000        2.2000000                .
3.0000000        3.3000000        27.000000

Continuing from the above example, if we perform a full outer join:

D = outerJoin(A, 1, B, 1, "full");

then unmatched data from B is also included, and D equals:

1.0000000        1.1000000        9.0000000
2.0000000        2.2000000                .
3.0000000        3.3000000        27.000000
8.0000000                .        15.000000

Join on two columns

A = { 1 3.1 12 0.5,
      3 1.2 15 0.6,
      5 4.4 19 1.1,
      2 6.9 11 0.9 };

B = { 7 20 0.3 5,
      2 12 1.1 1,
      9 15 0.1 3 };

a_keys = { 1, 3 };
b_keys = { 4, 2 };

// Perform a left-outer join on A and B, based on matches
// from the 1st column of A with the 4th column of B
// and the 3rd column of A with the 2nd column of B
C = outerJoin(A, a_keys, B, b_keys);

After the code above, C equals:

1 3.1 12 0.5 2 1.1
3 1.2 15 0.6 9 0.1
5 4.4 19 1.1 .   .
2 6.9 11 0.9 .   .

Join dataframes based on a date variable

// Load Tbill and US recession data
tbill_3mo = loadd(getGAUSSHome() $+ "examples/tbill_3mo.xlsx", "date($obs_date) + tbill_3m");
USREC = loadd(getGAUSSHome() $+ "examples/USREC.csv",  "date($DATE) + USREC");

// Perform left outer-join
tbill_rec = outerJoin(tbill_3mo, "obs_date", USREC, "DATE");

print tbill_rec[1:12,.];

will print out:

  obs_date         tbill_3m            USREC
1982-01-01        12.920000        1.0000000
1982-02-01        14.280000        1.0000000
1982-03-01        13.310000        1.0000000
1982-04-01        13.340000        1.0000000
1982-05-01        12.710000        1.0000000
1982-06-01        13.080000        1.0000000
1982-07-01        11.860000        1.0000000
1982-08-01        9.0000000        1.0000000
1982-09-01        8.1900000        1.0000000
1982-10-01        7.9700000        1.0000000
1982-11-01        8.3500000        1.0000000
1982-12-01        8.2000000        0.0000000

Remarks

By default, outerJoin() performs a left outer-join, retaining only the key columns from the first input matrix.

The first columns of the output matrix C will be the columns of A in the same order as in A. The remaining columns of C will be the columns of B with the key columns removed.

See also

Functions innerJoin()