Chapter 3: Simple Linear Regression: Estimation of an Optimal Hedge Portfolio ================================================================================ Example 1: Estimate Univariate Regression of Spot on Futures -------------------------------------------------------------- This example demonstrates how to compute ordinary least squares (OLS) estimates of the equation: .. math:: \text{Spot} = \alpha + \beta_1\text{Futures} + \epsilon Getting Started ++++++++++++++++++++++++++++++++++++++++++ To run this example on your own you will need to install the BrooksEconFinLib package. This package houses all examples and associated data. How to ++++++++++++++++++++++++++++++++++++++++++ Step One: Loading data ^^^^^^^^^^^^^^^^^^^^^^^^^^^ To start, load the relevant variables from the dataset using :func:`loadd` and a `formula string `_. To replicate this example, we will load the following variables: * Date * Spot * Futures :: // Create file name with full path data_set = getGAUSSHome() $+ "pkgs/BrooksEcoFinLib/examples/Sandphedge.csv"; // Use formula string to specify the variables to load and to tell // GAUSS that Date is a date variable data = loadd(data_set, "date(Date) + Spot + Futures"); // Print the first 5 observations of all columns of our data head(data); :: Date Spot Futures 1979-09-01 947.28003 954.50000 1979-10-01 914.62000 924.00000 1979-11-01 955.40002 955.00000 1979-12-01 970.42999 979.25000 1980-01-01 980.28003 987.75000 Since CSV files do not keep track of variable types, we surround the name of our date variable in ``date()`` so that GAUSS treats it as a date variable. The date variable is in a standard date format that GAUSS figures out automatically. For the cases when you need to read an uncommon date format, GAUSS allows you to specify it in your formula string. You can read more about this in the Programmatic Data Import section of the GAUSS Data Management Guide. Step Two: Perform OLS estimation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We pass the dataframe, ``data`` and a formula string to the :func:`olsmt` procedure to perform the estimation and print an output table. The :keyword:`call` keyword tells GAUSS to not return any data, so it just prints the report. :: // Perform OLS estimation and print report call olsmt(data, "Spot ~ Futures"); :: Valid cases: 247 Dependent variable: Spot Missing cases: 0 Deletion method: None Total SS: 47960127.957 Degrees of freedom: 245 R-squared: 1.000 Rbar-squared: 1.000 Residual SS: 11692.797 Std error of est: 6.908 F(1,245): 1004666.955 Probability of F: 0.000 Standard Prob Standardized Cor with Variable Estimate Error t-value >|t| Estimate Dep Var ------------------------------------------------------------------------------- CONSTANT -2.83784 1.48897 -1.9059 0.058 --- --- Futures 1.00161 0.000999277 1002.33 0.000 0.999878 0.999878 Example 2: Estimate Univariate Regression Spot and Futures Returns -------------------------------------------------------------------- This example demonstrates how to transform the variables into logarithmic returns and estimate the equation: .. math:: \text{Ret_Spot} = \alpha + \beta_1\text{Ret_Futures} + \epsilon Getting Started ++++++++++++++++++++++++++++++++++++++++++ To run this example on your own you will need to follow the data loading steps from the above example. How to ++++++++++++++++++++++++++++++++++++++++++ Step One: Compute log returns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Our first step is to define the procedure we will use to compute the log returns and apply it to our data. Our blog, `The Basics of GAUSS Procedures `_ explains everything you need to know to understand this procedure. :: // Define procedure to compute log returns proc (1) = lnDiff(x); local x_diff; // Compute log returns x_diff = 100 * ln(x ./ lagn(x, 1)); // Remove all rows with missing values x_diff = packr(x_diff); retp(x_diff); endp; // Create new dataframe that contains the log difference of our variables ret_data = lnDiff(data[., "Spot" "Futures"]); Step Two: Change variable names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We could have combined this with the previous step, but we will do each step separately for clarity. :: // Create a 2x1 string array using the string concatenation operator names = "ret_spot" $| "ret_futures"; // Set variable names ret_data = dfname(ret_data, names); Step Three: Compute descriptive statistics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We can compute descriptive statistics on our new dataframe with the :func:`dstatmt` procedure as shown below. :: // Compute descriptive statistics and print them call dstatmt(ret_data); will print the following: :: -------------------------------------------------------------------------------------------- Variable Mean Std Dev Variance Minimum Maximum Valid Missing -------------------------------------------------------------------------------------------- ret_spot 0.4168 4.333 18.78 -18.56 10.23 246 0 ret_futures 0.414 4.419 19.53 -18.94 10.39 246 0 Step Four: Estimate linear model on return data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Finally, we regress ``ret_spot`` on ``ret_futures``. :: // Estimate the linear model and print the results call olsmt(ret_data, "ret_spot ~ ret_futures"); will print the following: :: Valid cases: 246 Dependent variable: ret_spot Missing cases: 0 Deletion method: None Total SS: 4600.534 Degrees of freedom: 244 R-squared: 0.989 Rbar-squared: 0.989 Residual SS: 51.684 Std error of est: 0.460 F(1,244): 21474.923 Probability of F: 0.000 Standard Prob Standardized Cor with Variable Estimate Error t-value >|t| Estimate Dep Var ---------------------------------------------------------------------------------- CONSTANT 0.0130773 0.0294729 0.443707 0.658 --- --- ret_futures 0.975077 0.00665385 146.543 0.000 0.994367 0.994367