 Apr11

# Linear Regression

A linear regression model assumes that the regression function is linear in the inputs. Linear models are quite simple and often provide an adequate description of how the inputs affect the output. In order to build a more complex model, man can first transform the inputs using for example log or power functions and derive a linear model based on top of these transformations.

If we have an input vector X and we want to predict an output vector Y, a linear model estimate of Y is E(Y|X) such as:
$E(Y|X) = \beta_0 + \sum_{j=1}^p X_j \beta_j$

This can be rewritten as follows, considering you add a column of ones in front of X:
$E(Y|X) = f(X) = \beta X$

The betas are unknown parameters which linearly connect the inputs and the output. From the training dataset, one common method to derive the betas is to minimize the residual sum of squares:
$RSS(\beta) = \sum_{i=1}^N(y_i – f(x_i))^2$ or $\beta = argmin_\beta(\sum_{j=1}^N(y_i – f(x_i))^2)$

Please find below scripts to computationally solve this problem:

MATLAB – Normal Equation Method
[sourcecode language=”matlab”]
% Initialize
X = myInputs;
Y = myOutput;

% Add a column of ones for the linear intercept
X = [ones(length(y), 1) X];

% Calculate the parameters from the normal equation
beta = pinv(X’ * X) * X’ * Y;

% Estimate the output based on inputs
EY = X * beta;
[/sourcecode]

[sourcecode language=”matlab”]
% Initialize
X = myInputs_Normalized
Y = myOutput

% Add a column of ones for the linear intercept
X = [ones(length(y), 1) X];

alpha = 0.03;
iterations = 100;
beta = zeros(size(X,2), 1);