Implementing Logistic Regression from Scratch

Motivating Problem: Binary Classification

Binary classification involves categorizing data into one of two classes. Logistic regression is apt for this as it predicts the probability that a given input belongs to a class, which is particularly useful in scenarios like email spam detection, disease diagnosis, and more.

What is Logistic Regression?

Logistic regression is a supervised machine learning algorithm used for classification. In this algorithm, I implement binary logistic regression with stochastic gradient descent.
The goal of logistic regression is to find the best boundary line (or hyperplane) between classes. The boundary line is defined by weights w and a bias b.
Binary means that there are only two classes we are trying to discriminate between.
Logistic refers to the sigmoidal function, which maps any real number between 0 and 1. This is our activation function.
To train the model, we first define a cost function we aim to minimise. The cost function is the average of the losses for all the training cases. In our case, we use log loss.
We use gradient descent to find the values of w and b that minimise the cost for the given training set.
To perform gradient descent, we first make an initial guess for the values of w and b (choosing small random numbers). We then incrementally take small steps downhill (negative the gradient of the cost function), controlled by learning rate alpha.
In this implementation, I use stochastic gradient descent, which means the gradient is calculated for one randomly selected training case at a time.
We stop training if convergence is reached or if we reach a maximum number of iterations without convergence.
The model with its newly trained w and b can now be used to make predictions.

Define the Logistic Function

The logistic function, or sigmoid function, is defined as follows:

# define the logistic / sigmoid function
def logistic_function(z):
    return 1 / (1 + np.exp(-z))

This function maps any real-valued number into the (0, 1) interval, making it suitable for modeling probability distributions.

Define Logistic Regression with Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent is a variation of the gradient descent algorithm that processes one training example per iteration. Here's how we implement it

def logistic_regression_stochastic_gradient_descent(X, y):
  # parameters
  alpha = 0.01 # learning rate
  max_iterations = 2000000 # maximum number of iterations
  threshold = 1e-4 # convergence threshold
  N, n_features = X.shape  # number of examples and input features
  np.random.seed(0)  # setting random seed for reproducibility

  # initalise w and b to valid intial values (small random values)
  w = np.random.normal(0, 0.01, (n_features, 1))
  b = np.random.normal(0, 0.01)

  # intitalise variables related to the stopping criteria
  stopping = False
  J_running = 0
  J_running_prev = 0
  iteration = 0

  while not stopping: # start training

        # select a example at random from the training set
        random_case = np.random.randint(N)
        x_i = X[random_case, :].reshape(1, -1)
        y_i = y[random_case].reshape(-1, 1)

        # forward propagation stage
        y_hat = logistic_function(np.dot(x_i, w) + b) # calculate y_hat from x_i, w, b
        J_current = -np.sum(y_i * np.log(y_hat) + (1 - y_i) * np.log(1 - y_hat))  # calculate J_current from y_i, y_hat

        # gradient descent stage
        for j in range(n_features):
            delta_w_j = (y_hat - y_i) * x_i[0, j] # calculate the gradient for the j-th feature
            w[j,0] -= alpha * delta_w_j[0]  # update the j-th weight
            delta_b = np.sum(y_hat - y_i) # calculate the gradient for the bias

        # update the bias
        b -= alpha * delta_b


        # check stopping criteria
        iteration += 1
        J_running += J_current

        if iteration > max_iterations:  #failed to converge
            stopping = True
            print("Stopping: Reached maximum iterations without convergence.")

        if iteration % N == 0: # test for convergence on the batch
            if abs(J_running - J_running_prev) < threshold:
                stopping = True
            J_running_prev = J_running
            J_running = 0  # reset cost for the next N iterations

  return w, b # return updated weights