## Machine Learning Ex3 - Multivariate Linear Regression

Part 1. Finding alpha.
The first question to resolve in Exercise 3 is to pick a good learning rate alpha.

This require making an initial selection, running gradient descent and observing the cost function.

I test alpha range from 0.01 to 1.

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  ##preparing data input. x <- read.table("ex3x.dat", header=F) y <- read.table("ex3y.dat", header=F)   #normalize features using Z-score. x[,1] <- (x[,1] - mean(x[,1]))/sd(x[,1]) x[,2] <- (x[,2] - mean(x[,2]))/sd(x[,2])   x <- cbind(x0=rep(1, nrow(x)), x) x <- as.matrix(x)   ##gradient descent algorithm. gradDescent_internal <- function(theta, x, y, m, alpha) { h <- sapply(1:nrow(x), function(i) t(theta) %*% x[i,]) j <- t(h-y) %*% x grad <- 1/m * j theta <- t(theta) - alpha * grad theta <- t(theta) return(theta) }   ## cost function. J <- function(theta, x, y, m) { h <- sapply(1:nrow(x), function(i) t(theta) %*% x[i,]) j <- 2*sum((h-y)^2)/m return(j) }   ## calculate cost function J for every iteration at specific alpha value. testLearningRate <- function(x,y, alpha, niter=50) { j <- rep(0, niter) m <- nrow(x) theta <- matrix(rep(0, ncol(x)), ncol=1) for (i in 1:niter) { theta <- gradDescent_internal(theta,x,y,m, alpha) j[i] <- J(theta, x, y, m) } return(j) }     ## test learning rate. alpha=c(0.01, 0.03, 0.1, 0.3, 1) xxx=sapply(alpha, testLearningRate, x=x, y=y) colnames(xxx) <- as.character(alpha)   require(ggplot2) xxx <- melt(xxx) names(xxx) <- c("niter", "alpha", "J") p <- ggplot(xxx, aes(x=niter, y=J)) p+geom_line(aes(colour=factor(alpha))) +xlab("Number of iteractions") +ylab("Cost J")

alpha = 1 seems to be the best.

Part 2. Normal Equations.
The cost function:
$J(\theta) = \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2$

can be written in matrix notation.
$\theta=\left(X^{T}X\right)^{-1}X^{T}\vec{y}$

This function estimate how big is the error of our model VS the data.

To minimize it, we can calculate its derivative, set it to 0 and find the value of theta:
$\frac{\delta}{\delta \theta_j} J(\theta_j) = 0$

Then, the value of theta will be obtained with:
$\theta = (X^T X)^{-1} (X^T y)$

That can be easily implemented by:

##loading data...
x <- cbind(x0=rep(1, nrow(x)), x)
x <- as.matrix(x)
y <- y[,1]

## using normal equation to calculate theta.
theta <- solve(t(x) %*% x) %*% t(x) %*% y


that is :

> theta
[,1]
theta_0 89597.9095
theta_1   139.2107
theta_2 -8738.0191