Jun11

# Logistic Regression

The logistic regression is a type of regression which is used to predict an outcome which comes in a categorical form. It is widely used in biostatistics where binary reponses occur quite frequently – such as if somebody has cancer or not.

In order to keep the outcome between 0 and 1, we apply the logistic function (also called sigmoid function):
$g(z) = \frac{1}{1 + exp(-z)}$

The logistic regression hypothesis is then defined as:
$h_\beta(x) = g(\beta^T x)$

Logistic regressions are usually fit by maximum likelihood. The cost function we want to minize is the opposite of the log-likelihood function:
$J(\beta ) = \frac{1}{m} \sum_{i=1}^m[-y_i log(h_\beta (x_i) – (1 – y_i) log(1-h_\beta (x_i))]$

This imply to solve the following equation:
$\frac{dJ(\beta)}{d\beta} = \frac{1}{m} \sum_{i=1}^Nx_i(h_\beta(x_i)-y_i) = 0$

MATLAB – Logistic Regression Function
[sourcecode language=”matlab”]
function g = logit(z)
g = 1 ./ (1 + exp(-z));
end
[/sourcecode]

MATLAB – Gradient Descent Method
[sourcecode language=”matlab”]
% Initialize
X = myInputs_Normalized
Y = myOutput

% Add a column of ones for the linear intercept
X = [ones(length(y), 1) X];

% Initialize gradient descent parameters
alpha = 0.03;
iterations = 100;
beta = zeros(size(X,2), 1);

% Run gradient descent
for iteration = 1:iterations
oldBeta = beta ;
for dim = 1:length(beta)
beta(dim) = oldBeta(dim) – alpha / length(y) * sum( ((logit(X * oldBeta) – Y) .* X(:,dim) )
end
end

% Estimate the output based on inputs
EY = logit(X * beta);
[/sourcecode]