Chapter 3: Simple Linear Regression: Estimation of an Optimal Hedge Portfolio

Example 1: Estimate Univariate Regression of Spot on Futures

This example demonstrates how to compute ordinary least squares (OLS) estimates of the equation:

\[\text{Spot} = \alpha + \beta_1\text{Futures} + \epsilon\]

Getting Started

To run this example on your own you will need to install the BrooksEconFinLib package. This package houses all examples and associated data.

How to

Step One: Loading data

To start, load the relevant variables from the dataset using loadd() and a formula string.

To replicate this example, we will load the following variables:

  • Date

  • Spot

  • Futures

// Create file name with full path
data_set = getGAUSSHome() $+ "pkgs/BrooksEcoFinLib/examples/Sandphedge.csv";

// Use formula string to specify the variables to load and to tell
// GAUSS that Date is a date variable
data = loadd(data_set, "date(Date) + Spot + Futures");

// Print the first 5 observations of all columns of our data
      Date             Spot          Futures
1979-09-01        947.28003        954.50000
1979-10-01        914.62000        924.00000
1979-11-01        955.40002        955.00000
1979-12-01        970.42999        979.25000
1980-01-01        980.28003        987.75000

Since CSV files do not keep track of variable types, we surround the name of our date variable in date() so that GAUSS treats it as a date variable.

The date variable is in a standard date format that GAUSS figures out automatically. For the cases when you need to read an uncommon date format, GAUSS allows you to specify it in your formula string. You can read more about this in the Programmatic Data Import section of the GAUSS Data Management Guide.

Step Two: Perform OLS estimation

We pass the dataframe, data and a formula string to the olsmt() procedure to perform the estimation and print an output table. The call keyword tells GAUSS to not return any data, so it just prints the report.

// Perform OLS estimation and print report
call olsmt(data, "Spot ~ Futures");
Valid cases:                   247      Dependent variable:                Spot
Missing cases:                   0      Deletion method:                   None
Total SS:             47960127.957      Degrees of freedom:                 245
R-squared:                   1.000      Rbar-squared:                     1.000
Residual SS:             11692.797      Std error of est:                 6.908
F(1,245):              1004666.955      Probability of F:                 0.000

                         Standard                 Prob   Standardized  Cor with
Variable     Estimate      Error      t-value     >|t|     Estimate    Dep Var

CONSTANT     -2.83784     1.48897     -1.9059     0.058       ---         ---
Futures       1.00161 0.000999277     1002.33     0.000    0.999878    0.999878

Example 2: Estimate Univariate Regression Spot and Futures Returns

This example demonstrates how to transform the variables into logarithmic returns and estimate the equation:

\[\text{Ret_Spot} = \alpha + \beta_1\text{Ret_Futures} + \epsilon\]

Getting Started

To run this example on your own you will need to follow the data loading steps from the above example.

How to

Step One: Compute log returns

Our first step is to define the procedure we will use to compute the log returns and apply it to our data. Our blog, The Basics of GAUSS Procedures explains everything you need to know to understand this procedure.

 // Define procedure to compute log returns
 proc (1) = lnDiff(x);
     local x_diff;

     // Compute log returns
     x_diff =  100 * ln(x ./ lagn(x, 1));

     // Remove all rows with missing values
     x_diff = packr(x_diff);


// Create new dataframe that contains the log difference of our variables
ret_data = lnDiff(data[., "Spot" "Futures"]);

Step Two: Change variable names

We could have combined this with the previous step, but we will do each step separately for clarity.

// Create a 2x1 string array using the string concatenation operator
names = "ret_spot" $| "ret_futures";

// Set variable names
ret_data = dfname(ret_data, names);

Step Three: Compute descriptive statistics

We can compute descriptive statistics on our new dataframe with the dstatmt() procedure as shown below.

// Compute descriptive statistics and print them
call dstatmt(ret_data);

will print the following:

Variable            Mean     Std Dev      Variance     Minimum     Maximum     Valid Missing

ret_spot          0.4168       4.333         18.78      -18.56       10.23       246    0
ret_futures        0.414       4.419         19.53      -18.94       10.39       246    0

Step Four: Estimate linear model on return data

Finally, we regress ret_spot on ret_futures.

// Estimate the linear model and print the results
call olsmt(ret_data, "ret_spot ~ ret_futures");

will print the following:

Valid cases:                   246      Dependent variable:            ret_spot
Missing cases:                   0      Deletion method:                   None
Total SS:                 4600.534      Degrees of freedom:                 244
R-squared:                   0.989      Rbar-squared:                     0.989
Residual SS:                51.684      Std error of est:                 0.460
F(1,244):                21474.923      Probability of F:                 0.000

                            Standard                 Prob   Standardized  Cor with
Variable        Estimate      Error      t-value     >|t|     Estimate    Dep Var

CONSTANT       0.0130773   0.0294729    0.443707     0.658       ---         ---
ret_futures     0.975077  0.00665385     146.543     0.000    0.994367    0.994367