The first part of the Exercise 5.1 requires to implement a regularized version of linear regression.
Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial.
Plot the data:
1 2 3 4 5 6 7 8 9 | x <- read.table("ex5Linx.dat") y <- read.table("ex5Liny.dat") x <- x[,1] y <- y[,1] require(ggplot2) d <- data.frame">data.frame(x=x,y=y) p <- ggplot(d, aes(x,y)) + geom_point(colour="red", size=3) |
I will fit a 5th order polynomial, the hypothesis is:
\( h_\theta(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2^2 + \theta_3 x_3^3 + \theta_4 x_4^4 + \theta_5 x_5^5 \)
The idea of regularization is to impose Occam's razor on the solution, by scaling down the \( \theta \) which will lead to the tiny contribution of the higher order features.
For that, the cost function was defined as:
\( J(\theta) = \frac{1}{2m} [\sum_{i=1}^m ((h_\theta(x^{(i)}) - y^{(i)})^2) + \lambda \sum_{i=1}^n \theta^2] \)
Gradient Descent with regularization parameters:
Firstly, I implemented a gradient descent algorithm to find the theta.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | mapFeature <- function(x, degree=5) { return(sapply(0:degree, function(i) x^i)) } ## hypothesis function h <- function(theta, x) { #sapply(1:m, function(i) theta %*% x[i,]) toReturn <- x %*% t(theta) return(toReturn) } ## cost function J <- function(theta, x, y, lambda=1) { m <- length(y) r <- theta^2 r[1] <- 0 j <- 1/(2*m) * sum((h(theta, x)-y)^2) + lambda*sum(r) return(j) } gradDescent <- function(theta, x, y, alpha=0.1, niter=1000, lambda=1) { m <- length(y) for (i in 1:niter) { tt <- theta tt[1] <- 0 dj <- 1/m * (t(h(theta,x)-y) %*% x + lambda * tt) theta <- theta - alpha * dj } return(theta) } |
fitting the data with gradient descent algorithm:
1 2 3 4 5 6 7 8 | x <- mapFeature(x) theta <- matrix(rep(0,6), nrow=1) theta <- gradDescent(theta, x, y) x.test <- seq(-1,1, 0.001) y.test <- mapFeature(x.test) %*% t(theta) p+geom_line(aes(x=x.test, y=y.test), colour="blue") |
As shown above, the fitting model fits the data well.
Normal Equation with regularization parameters:
The Exercise requires implementing Normal Equation with the regularization parameters added.
that is:
\( \theta = (X^T X + \lambda \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & ? & \\ & & & 1 \end{bmatrix} )^{-1} (X^T y) \)
1 2 3 4 5 6 7 8 9 | ## normal equations normEq <- function(x,y, lambda) { n <- ncol(x) ## extra regularizatin terms r <- lambda * diag(n) r[1,1] <- 0 theta <- solve(t(x) %*% x + r) %*% t(x) %*% y return(theta) } |
I try 3 different lambda values to see how it influences the fit.
1 2 3 4 5 6 7 8 9 | lambda <- c(0,1,10) theta <- sapply(lambda, normEq, x=x, y=y) x.test <- seq(-1,1, 0.001) yy <- sapply(1:3, function(i) mapFeature(x.test) %*% theta[,i]) yy <- melt(yy) yy[,1] <- rep(x.test, 3) colnames(yy) <- c("X", expression(lambda), "Y") yy$lambda=factor(yy$lambda, labels=unique(lambda)) p+geom_line(data=yy,aes(X,Y, group=lambda, colour=lambda)) |

With lambda=0, the fit is very tight to the original points (the red line), and of course it is over-fitting.
As lambda increase, the model gets less tight and more generalized, and therefore preventing over-fitting.
This figure can also lead to a conclusion, that when lambda is too large, the model will under-fitting.
Reference:
Machine Learning Course
Exercise 5



I'm having trouble replicating your results. In your code shown I have two questions: (1) when is the cost function, J, called? and (2) shouldn't the "i:niter" in gradDescent function be "1:niter"?
Reply
andy
Reply:
October 26th, 2011 at 3:26 am
I forgot to mention: I really enjoy your Open Classroom posts!
Reply
ygc
Reply:
October 26th, 2011 at 9:48 am
1. yes, the cost function is defined as J.
2. you are right, sorry for the typo.
I am glad that you find it useful.
Reply
andy
Reply:
October 26th, 2011 at 10:22 am
But shouldn't J be called somewhere in the code?
Reply
ygc
Reply:
October 26th, 2011 at 10:26 am
Actually, the derivative function of J is need, but not J itself.
As you can see, the variable *dj* in function *gradDescent*.
Reply
Running your code as posted I get "Error: object 'm' not found" and "Error: object 'lambda' not found" if I set m to some dummy value -- so my question really is where are those variables set, or initialized, besides in *J*?
Reply
ygc
Reply:
October 27th, 2011 at 12:41 pm
corrected, please run it again.
The issue is cause by the definition of function *gradDescent*.
I forget to set the values.
Reply
andy
Reply:
October 28th, 2011 at 3:11 am
Thanks! It works fine now. Sorry for the pestering
Reply
In the gradDescent function definition, shouldn't it say tt[1] <- 0 instead of tt[0] <- 0?
Reply
ygc
Reply:
January 29th, 2012 at 12:40 am
yes.
Reply