Monday, July 29, 2019

Stanford's Machine Learning by Andrew Ng - Locally Weighted Regression



In addition to the size of the data set influencing how well the algorithm performs, there's also the number of parameters. One can underfit the data. The data might not follow a straight line while one parameter gives a
linear relation and so the fit isn't very good. If there are as many parameters as data points, the fit will go straight through every data point but might be way off outside of these. Which results in overfitting.


Instead of using to many parameters ,one might try to weight data points near values that you want to predict heavier then data points far away. Using a weight factor w so that data points that are likely to be more relevant to the prediction are counter heavier

$$ θ_j := θ_j - \alpha(\sum_{i=0}^{m}w_i(θ^Tx-y_i))x_j$$

$$ w_i = exp\left(\frac{(x_i-x)^2}{2T^2}\right) $$

Combining this with the batch stochastic gradiënt formula results in the following algorithm:

def locallyWeightedRegression(trainingset, learningRate, mean, iteration = 1000, width=1):     weights = len(trainingset.transpose())-1     theta = np.zeros(weights)     for i in range(iteration):         paras = theta         nextTheta = np.zeros(len(paras))         for index, parameter in enumerate(paras):             xi, yi = trainingset.transpose()[0:3], trainingset.transpose()[3]             x = np.full((len(trainingset),3), mean)             w = xi[index]-x.transpose()[index]             weightFactor = -(w**2)/(2*width**2)             #print('weight factor', weightFactor, type(weightFactor))             costForThisVector = learningRate*np.exp(weightFactor.astype(float))*(theta.dot(xi)-yi)*xi[index]             #print('cost vector', costForThisVector)             nextTheta[index] = parameter - np.sum(costForThisVector)/len(trainingset)         theta = np.array(nextTheta)     return theta
Until now, I calculated the root mean square error of every fourth data example and used the other three fourths as the training examples. This would not work very well here. Either I need to do a new regression for every test, which is computationally expensive, or I create a massive error trying to use the algorithm only once and fitting it to test points it wasn't meant to fit too.

No comments:

Post a Comment