Thursday, July 25, 2019

Stanford's Machine Learning by Andrew Ng - Stochastic Gradient Descent




Batch Gradient Descent is very slow for very large datasets. Luckily an alternative exist: Stochastic gradient descent or incremental gradient descent. Rather then going over the entire dataset, the algorithm just looks at one training example at a time and then takes a small step towards minimization. The mathematical formula is the same, but the loop is different.

$$ θ_j := θ_j - \alpha(\sum_{i=0}^{m}θ^Tx-y_i)x_j$$


With θ the different weights, x the different training examples variables and y the different training examples targets. I tried to put this into a python algorithm as well.


import Numpy as np import math def StochasticGradientDescent (trainingset, learningRate): weights = len(trainingset.transpose()) - 1 theta = np.zeros(weights) for trainingExample in trainingset: nextTheta = [0 for number in theta] for index, parameter in enumerate(theta): xi, yi = trainingExample[0:weights], trainingExample[weights] nextTheta[index] = parameter - learningRate*(theta.dot(xi)-yi)*xi[index] theta = np.array(nextTheta) return theta def RMS_Error (controlData, theta): parameters = max(theta.shape) print(parameters, theta, controlData) rms_error= (theta.dot(controlData.transpose()[0:parameters])-controlData.transpose()[parameters])**2 n = len(rms_error) error = math.sqrt(np.sum(rms_error)/n) return error
For my not too big dataset, stochastic gradient's root mean square error wasn't to bad compared to batch gradiënt descent. For cubic it actually got a lower one, weirdly. But it was higher for the squared and linear.

RMS_error for linear corruption perception = 24247.5 ; theta = (-2.7, 70.6, 45), Learning Rate : 1/1e4
RMS_error for quadratic corruption perception = 15570 ; theta = (-2.52e-4, 4.66, 9.449e-2), Learning Rate: 1/1e8
RMS_error for cubic corruption perception = 9895 ; theta = (-2.979e-5, 0.113, -8.34*e-4), Learning Rate: 1/1e11


He also talks about a non-iterative algorithm to optimize the theta's, Namely the Normal equations, I tried to implement it using Numpy, but my RMS error is off the chart compared to gradiënt descent, So I think there's a mistake somewhere.

$$ θ = (X^TX)^{-1}X^T\overrightarrow{y}$$

def normalEquation(trainingset): weights = len(trainingset.transpose()) - 1 x = trainingset.transpose()[:weights].transpose() y = trainingset.transpose()[weights].transpose() XTX = np.linalg.inv(np.matrix(np.dot(x.transpose(),x),dtype='float')) XTY = np.dot(x.transpose(),y) theta = XTY.dot(XTX).getA() return theta

No comments:

Post a Comment