Journey To Nowhere: Stanford's Machine Learning by Andrew Ng

Batch Gradient Descent is very slow for very large datasets. Luckily an alternative exist: Stochastic gradient descent or incremental gradient descent. Rather then going over the entire dataset, the algorithm just looks at one training example at a time and then takes a small step towards minimization. The mathematical formula is the same, but the loop is different.

$θ_j := θ_j - \alpha(\sum_{i=0}^{m}θ^Tx-y_i)x_j$

With θ the different weights, x the different training examples variables and y the different training examples targets. I tried to put this into a python algorithm as well.

import Numpy as np
import math
def StochasticGradientDescent (trainingset, learningRate):
    weights = len(trainingset.transpose()) - 1
    theta = np.zeros(weights)
    for trainingExample in trainingset:
        nextTheta = [0 for number in theta]
        for index, parameter in enumerate(theta):
            xi, yi = trainingExample[0:weights], trainingExample[weights]
            nextTheta[index] = parameter - learningRate*(theta.dot(xi)-yi)*xi[index]
        theta = np.array(nextTheta)
    return theta

def RMS_Error (controlData, theta):
    parameters = max(theta.shape)
    print(parameters, theta, controlData)
    rms_error= (theta.dot(controlData.transpose()[0:parameters])-controlData.transpose()[parameters])**2
    n = len(rms_error)
    error = math.sqrt(np.sum(rms_error)/n)
    return error

For my not too big dataset, stochastic gradient's root mean square error wasn't to bad compared to batch gradiënt descent. For cubic it actually got a lower one, weirdly. But it was higher for the squared and linear.

RMS_error for linear corruption perception = 24247.5 ; theta = (-2.7, 70.6, 45), Learning Rate : 1/1e4
RMS_error for quadratic corruption perception = 15570 ; theta = (-2.52e-4, 4.66, 9.449e-2), Learning Rate: 1/1e8
RMS_error for cubic corruption perception = 9895 ; theta = (-2.979e-5, 0.113, -8.34*e-4), Learning Rate: 1/1e11

He also talks about a non-iterative algorithm to optimize the theta's, Namely the Normal equations, I tried to implement it using Numpy, but my RMS error is off the chart compared to gradiënt descent, So I think there's a mistake somewhere.

$θ = (X^TX)^{-1}X^T\overrightarrow{y}$

def normalEquation(trainingset):
    weights = len(trainingset.transpose()) - 1
    x = trainingset.transpose()[:weights].transpose()
    y = trainingset.transpose()[weights].transpose()
    XTX = np.linalg.inv(np.matrix(np.dot(x.transpose(),x),dtype='float'))
    XTY = np.dot(x.transpose(),y)
    theta = XTY.dot(XTX).getA()
    return theta

Journey To Nowhere

Thursday, July 25, 2019

Stanford's Machine Learning by Andrew Ng - Stochastic Gradient Descent

No comments:

Post a Comment