 May13

# Shrinkage Methods for Linear Regression

By discarding part of the predictors or inputs and keeping only a subset of the original predictors, you may obtain a model which is more interpretable. In addition to that, it might have a better prediction error on new datasets by preventing over fitting the training dataset.

In all techniques, multiple models will be compared to see which one is the best. In order to find the best model, you can apply the following method:
• Divide the dataset into a training dataset and a validation dataset.
• Use the training dataset to teach the algorithms.
• Calculate the cost value on each model based on the validation dataset and select the one which has the smallest one – the cost function can be for example the residual sum of squares.

The Ridge regression imposes a penalty on the square size of the coefficients as follows, where you will choose the best model based on different lambdas:
$\beta^{ridge} = argmin_\beta(\sum_{j=1}^N(y_i – f(x_i))^2+ \lambda \sum_{j=1}^p \beta_j^2)$

The Lasso regression imposes a penalty on the absolute value of the coeffcients as follows, where you will choose the best model based on different lambdas:
$\beta^{lasso} = argmin_\beta(\sum_{j=1}^N(y_i – f(x_i))^2+ \lambda \sum_{j=1}^p |\beta_j|)$

You may also derive you own shrinkage methods based on the following template, where you will choose the best model based on different lambdas and qs:
$\beta^{shrink} = argmin_\beta(\sum_{j=1}^N(y_i – f(x_i))^2+ \lambda \sum_{j=1}^p |\beta_j|^q)$

MATLAB – Gradient Descent Method for the Ridge Regression
[sourcecode language=”matlab”]
% Initialize
X = myInputs_Normalized
Y = myOutput

% Add a column of ones for the linear intercept
X = [ones(length(y), 1) X];

alpha = 0.03;
iterations = 100;
beta = zeros(size(X,2), 1);
lambda = 0.3;