Tags
Deep Learning 101Blog Post
Published
Motivating Problem: Binary Classification
Binary classification involves categorizing data into one of two classes. Logistic regression is apt for this as it predicts the probability that a given input belongs to a class, which is particularly useful in scenarios like email spam detection, disease diagnosis, and more.
What is Logistic Regression?
- Logistic regression is a supervised machine learning algorithm used for classification. In this algorithm, I implement binary logistic regression with stochastic gradient descent.
- The goal of logistic regression is to find the best boundary line (or hyperplane) between classes. The boundary line is defined by weights w and a bias b.
- Binary means that there are only two classes we are trying to discriminate between.
- Logistic refers to the sigmoidal function, which maps any real number between 0 and 1. This is our activation function.
- To train the model, we first define a cost function we aim to minimise. The cost function is the average of the losses for all the training cases. In our case, we use log loss.
- We use gradient descent to find the values of w and b that minimise the cost for the given training set.
- To perform gradient descent, we first make an initial guess for the values of w and b (choosing small random numbers). We then incrementally take small steps downhill (negative the gradient of the cost function), controlled by learning rate alpha.
- In this implementation, I use stochastic gradient descent, which means the gradient is calculated for one randomly selected training case at a time.
- We stop training if convergence is reached or if we reach a maximum number of iterations without convergence.
- The model with its newly trained w and b can now be used to make predictions.
Define the Logistic Function
The logistic function, or sigmoid function, is defined as follows:
# define the logistic / sigmoid function
def logistic_function(z):
return 1 / (1 + np.exp(-z))
This function maps any real-valued number into the (0, 1) interval, making it suitable for modeling probability distributions.
Define Logistic Regression with Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent is a variation of the gradient descent algorithm that processes one training example per iteration. Here's how we implement it
def logistic_regression_stochastic_gradient_descent(X, y):
# parameters
alpha = 0.01 # learning rate
max_iterations = 2000000 # maximum number of iterations
threshold = 1e-4 # convergence threshold
N, n_features = X.shape # number of examples and input features
np.random.seed(0) # setting random seed for reproducibility
# initalise w and b to valid intial values (small random values)
w = np.random.normal(0, 0.01, (n_features, 1))
b = np.random.normal(0, 0.01)
# intitalise variables related to the stopping criteria
stopping = False
J_running = 0
J_running_prev = 0
iteration = 0
while not stopping: # start training
# select a example at random from the training set
random_case = np.random.randint(N)
x_i = X[random_case, :].reshape(1, -1)
y_i = y[random_case].reshape(-1, 1)
# forward propagation stage
y_hat = logistic_function(np.dot(x_i, w) + b) # calculate y_hat from x_i, w, b
J_current = -np.sum(y_i * np.log(y_hat) + (1 - y_i) * np.log(1 - y_hat)) # calculate J_current from y_i, y_hat
# gradient descent stage
for j in range(n_features):
delta_w_j = (y_hat - y_i) * x_i[0, j] # calculate the gradient for the j-th feature
w[j,0] -= alpha * delta_w_j[0] # update the j-th weight
delta_b = np.sum(y_hat - y_i) # calculate the gradient for the bias
# update the bias
b -= alpha * delta_b
# check stopping criteria
iteration += 1
J_running += J_current
if iteration > max_iterations: #failed to converge
stopping = True
print("Stopping: Reached maximum iterations without convergence.")
if iteration % N == 0: # test for convergence on the batch
if abs(J_running - J_running_prev) < threshold:
stopping = True
J_running_prev = J_running
J_running = 0 # reset cost for the next N iterations
return w, b # return updated weights