Chapter 3: Simple Linear Regression: Estimation of an Optimal Hedge Portfolio
================================================================================
Example 1: Estimate Univariate Regression of Spot on Futures
--------------------------------------------------------------
This example demonstrates how to compute ordinary least squares (OLS) estimates of the equation:
.. math:: \text{Spot} = \alpha + \beta_1\text{Futures} + \epsilon
Getting Started
++++++++++++++++++++++++++++++++++++++++++
To run this example on your own you will need to install the BrooksEconFinLib package. This package houses all examples and associated data.
How to
++++++++++++++++++++++++++++++++++++++++++
Step One: Loading data
^^^^^^^^^^^^^^^^^^^^^^^^^^^
To start, load the relevant variables from the dataset using :func:`loadd` and a `formula string `_.
To replicate this example, we will load the following variables:
* Date
* Spot
* Futures
::
// Create file name with full path
data_set = getGAUSSHome() $+ "pkgs/BrooksEcoFinLib/examples/Sandphedge.csv";
// Use formula string to specify the variables to load and to tell
// GAUSS that Date is a date variable
data = loadd(data_set, "date(Date) + Spot + Futures");
// Print the first 5 observations of all columns of our data
head(data);
::
Date Spot Futures
1979-09-01 947.28003 954.50000
1979-10-01 914.62000 924.00000
1979-11-01 955.40002 955.00000
1979-12-01 970.42999 979.25000
1980-01-01 980.28003 987.75000
Since CSV files do not keep track of variable types, we surround the name of our date variable in ``date()`` so that GAUSS treats it as a date variable.
The date variable is in a standard date format that GAUSS figures out automatically. For the cases when you need to read an uncommon date format, GAUSS allows you to specify it in your formula string. You can read more about this in the Programmatic Data Import section of the GAUSS Data Management Guide.
Step Two: Perform OLS estimation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We pass the dataframe, ``data`` and a formula string to the :func:`olsmt` procedure to perform the estimation and print an output table. The :keyword:`call` keyword tells GAUSS to not return any data, so it just prints the report.
::
// Perform OLS estimation and print report
call olsmt(data, "Spot ~ Futures");
::
Valid cases: 247 Dependent variable: Spot
Missing cases: 0 Deletion method: None
Total SS: 47960127.957 Degrees of freedom: 245
R-squared: 1.000 Rbar-squared: 1.000
Residual SS: 11692.797 Std error of est: 6.908
F(1,245): 1004666.955 Probability of F: 0.000
Standard Prob Standardized Cor with
Variable Estimate Error t-value >|t| Estimate Dep Var
-------------------------------------------------------------------------------
CONSTANT -2.83784 1.48897 -1.9059 0.058 --- ---
Futures 1.00161 0.000999277 1002.33 0.000 0.999878 0.999878
Example 2: Estimate Univariate Regression Spot and Futures Returns
--------------------------------------------------------------------
This example demonstrates how to transform the variables into logarithmic returns and estimate the equation:
.. math:: \text{Ret_Spot} = \alpha + \beta_1\text{Ret_Futures} + \epsilon
Getting Started
++++++++++++++++++++++++++++++++++++++++++
To run this example on your own you will need to follow the data loading steps from the above example.
How to
++++++++++++++++++++++++++++++++++++++++++
Step One: Compute log returns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Our first step is to define the procedure we will use to compute the log returns and apply it to our data. Our blog, `The Basics of GAUSS Procedures `_ explains everything you need to know to understand this procedure.
::
// Define procedure to compute log returns
proc (1) = lnDiff(x);
local x_diff;
// Compute log returns
x_diff = 100 * ln(x ./ lagn(x, 1));
// Remove all rows with missing values
x_diff = packr(x_diff);
retp(x_diff);
endp;
// Create new dataframe that contains the log difference of our variables
ret_data = lnDiff(data[., "Spot" "Futures"]);
Step Two: Change variable names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We could have combined this with the previous step, but we will do each step separately for clarity.
::
// Create a 2x1 string array using the string concatenation operator
names = "ret_spot" $| "ret_futures";
// Set variable names
ret_data = dfname(ret_data, names);
Step Three: Compute descriptive statistics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We can compute descriptive statistics on our new dataframe with the :func:`dstatmt` procedure as shown below.
::
// Compute descriptive statistics and print them
call dstatmt(ret_data);
will print the following:
::
--------------------------------------------------------------------------------------------
Variable Mean Std Dev Variance Minimum Maximum Valid Missing
--------------------------------------------------------------------------------------------
ret_spot 0.4168 4.333 18.78 -18.56 10.23 246 0
ret_futures 0.414 4.419 19.53 -18.94 10.39 246 0
Step Four: Estimate linear model on return data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Finally, we regress ``ret_spot`` on ``ret_futures``.
::
// Estimate the linear model and print the results
call olsmt(ret_data, "ret_spot ~ ret_futures");
will print the following:
::
Valid cases: 246 Dependent variable: ret_spot
Missing cases: 0 Deletion method: None
Total SS: 4600.534 Degrees of freedom: 244
R-squared: 0.989 Rbar-squared: 0.989
Residual SS: 51.684 Std error of est: 0.460
F(1,244): 21474.923 Probability of F: 0.000
Standard Prob Standardized Cor with
Variable Estimate Error t-value >|t| Estimate Dep Var
----------------------------------------------------------------------------------
CONSTANT 0.0130773 0.0294729 0.443707 0.658 --- ---
ret_futures 0.975077 0.00665385 146.543 0.000 0.994367 0.994367