# Linear Regression

A linear regression model assumes that the regression function is linear in the inputs. Linear models are quite simple and often provide an adequate description of how the inputs affect the output. In order to build a more complex model, man can first transform the inputs using for example log or power functions and derive a linear model based on top of these transformations.

If we have an input vector X and we want to predict an output vector Y, a linear model estimate of Y is E(Y|X) such as:

[latex]E(Y|X) = \beta_0 + \sum_{j=1}^p X_j \beta_j[/latex]

This can be rewritten as follows, considering you add a column of ones in front of X:

[latex]E(Y|X) = f(X) = \beta X[/latex]

The betas are unknown parameters which linearly connect the inputs and the output. From the training dataset, one common method to derive the betas is to minimize the residual sum of squares:

[latex]RSS(\beta) = \sum_{i=1}^N(y_i – f(x_i))^2[/latex] or [latex]\beta = argmin_\beta(\sum_{j=1}^N(y_i – f(x_i))^2)[/latex]

Please find below scripts to computationally solve this problem:

MATLAB – Normal Equation Method

[sourcecode language=”matlab”]

% Initialize

X = myInputs;

Y = myOutput;

% Add a column of ones for the linear intercept

X = [ones(length(y), 1) X];

% Calculate the parameters from the normal equation

beta = pinv(X’ * X) * X’ * Y;

% Estimate the output based on inputs

EY = X * beta;

[/sourcecode]

MATLAB – Gradient Descent Method

[sourcecode language=”matlab”]

% Initialize

X = myInputs_Normalized

Y = myOutput

% Add a column of ones for the linear intercept

X = [ones(length(y), 1) X];

% Initialize gradient descent parameters

alpha = 0.03;

iterations = 100;

beta = zeros(size(X,2), 1);

% Run gradient descent

for iteration = 1:iterations

oldBeta = beta ;

for dim = 1:length(beta)

beta(dim) = oldBeta(dim) – alpha / length(y) * sum( (X * oldBeta – Y) .* X(:,dim) )

end

end

% Estimate the output based on inputs

EY = X * beta;

[/sourcecode]

In the gradient descent, you might want to compute the cost function at each step of the gradient descent to ensure alpha is properly initialized such as the cost always decreases in a significant manner.