Implementing Logistic Regression from Scratch

Tags
Deep Learning 101Blog Post
Published

Motivating Problem: Binary Classification

Binary classification involves categorizing data into one of two classes. Logistic regression is apt for this as it predicts the probability that a given input belongs to a class, which is particularly useful in scenarios like email spam detection, disease diagnosis, and more.

What is Logistic Regression?

  • Logistic regression is a supervised machine learning algorithm used for classification. In this algorithm, I implement binary logistic regression with stochastic gradient descent.
  • The goal of logistic regression is to find the best boundary line (or hyperplane) between classes. The boundary line is defined by weights w and a bias b.
  • Binary means that there are only two classes we are trying to discriminate between.
  • Logistic refers to the sigmoidal function, which maps any real number between 0 and 1. This is our activation function.
  • To train the model, we first define a cost function we aim to minimise. The cost function is the average of the losses for all the training cases. In our case, we use log loss.
  • We use gradient descent to find the values of w and b that minimise the cost for the given training set.
  • To perform gradient descent, we first make an initial guess for the values of w and b (choosing small random numbers). We then incrementally take small steps downhill (negative the gradient of the cost function), controlled by learning rate alpha.
  • In this implementation, I use stochastic gradient descent, which means the gradient is calculated for one randomly selected training case at a time.
  • We stop training if convergence is reached or if we reach a maximum number of iterations without convergence.
  • The model with its newly trained w and b can now be used to make predictions.

Define the Logistic Function

The logistic function, or sigmoid function, is defined as follows:

# define the logistic / sigmoid function
def logistic_function(z):
    return 1 / (1 + np.exp(-z))

This function maps any real-valued number into the (0, 1) interval, making it suitable for modeling probability distributions.

Define Logistic Regression with Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent is a variation of the gradient descent algorithm that processes one training example per iteration. Here's how we implement it

def logistic_regression_stochastic_gradient_descent(X, y):
  # parameters
  alpha = 0.01 # learning rate
  max_iterations = 2000000 # maximum number of iterations
  threshold = 1e-4 # convergence threshold
  N, n_features = X.shape  # number of examples and input features
  np.random.seed(0)  # setting random seed for reproducibility

  # initalise w and b to valid intial values (small random values)
  w = np.random.normal(0, 0.01, (n_features, 1))
  b = np.random.normal(0, 0.01)

  # intitalise variables related to the stopping criteria
  stopping = False
  J_running = 0
  J_running_prev = 0
  iteration = 0

  while not stopping: # start training

        # select a example at random from the training set
        random_case = np.random.randint(N)
        x_i = X[random_case, :].reshape(1, -1)
        y_i = y[random_case].reshape(-1, 1)

        # forward propagation stage
        y_hat = logistic_function(np.dot(x_i, w) + b) # calculate y_hat from x_i, w, b
        J_current = -np.sum(y_i * np.log(y_hat) + (1 - y_i) * np.log(1 - y_hat))  # calculate J_current from y_i, y_hat

        # gradient descent stage
        for j in range(n_features):
            delta_w_j = (y_hat - y_i) * x_i[0, j] # calculate the gradient for the j-th feature
            w[j,0] -= alpha * delta_w_j[0]  # update the j-th weight
            delta_b = np.sum(y_hat - y_i) # calculate the gradient for the bias

        # update the bias
        b -= alpha * delta_b


        # check stopping criteria
        iteration += 1
        J_running += J_current

        if iteration > max_iterations:  #failed to converge
            stopping = True
            print("Stopping: Reached maximum iterations without convergence.")

        if iteration % N == 0: # test for convergence on the batch
            if abs(J_running - J_running_prev) < threshold:
                stopping = True
            J_running_prev = J_running
            J_running = 0  # reset cost for the next N iterations

  return w, b # return updated weights