Machine Learning Ex4 - Logistic Regression

Exercise 4 required implementing Logistic Regression using Newton's Method.

The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student's scores on these two exams.

plot the data

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
x <- read.table("ex4x.dat",header=F, stringsAsFactors=F)
x <- cbind(rep(1, nrow(x)), x)
colnames(x) <- c("X0", "Exam1", "Exam2")
x <- as.matrix(x)
 
y <- read.table("ex4y.dat",header=F, stringsAsFactors=F)
y <- y[,1]
 
## plotting data
d <- data.frame">data.frame(x,
                y = factor(y,
                levels=c(0,1),
                labels=c("Not admitted","Admitted" )
                )
                )
 
require(ggplot2)
p <- ggplot(d, aes(x=Exam1, y=Exam2)) +
    geom_point(aes(shape=y, colour=y)) +
    xlab("Exam 1 score") +
    ylab("Exam 2 score")


Logistic Regression


We first need to define our Hypothesis Function that return values between[0,1],suitable for binary classification.

\(h_\theta(x) = g(\theta^T x) = \frac{1}{ 1 + e ^{- \theta^T x} }\)

function g is the sigmoid function, and function h return the probability of y=1:

\( h_\theta(x) = P (y=1 | x; \theta) \)

What we need is to compute \( \theta \) ,to find out the proper Hypothesis Function.

Similar to the linear regression,We defined the cost function, which estimate the error of hypothesis function fitting the sample data, at a given \( \theta\) .

The cost function was defined as:
\(J(\theta) = \frac{1}{m} \sum_{i=1}^m ((-y)log(h_\theta(x)) - (1 - y) log(1- h_\theta(x)) )\)

To determine the most suitable hypothesis function, we need to find the \( \theta\) value which minimize the value of \(J(\theta)\) . This can be achieved by the Newton's method, by finding the root of the derivative function of the cost function.

And the \( \theta\) can be updated by:
\(\theta^{(t+1)} = \theta^{(t)} - H^{-1} \nabla_{\theta}J\)

the gradient and Hessian are defined as:
\(\nabla_{\theta}J = \frac{1}{m} \sum_{i=1}^m (h_\theta(x) - y) x\)
\(H = \frac{1}{m} \sum_{i=1}^m [h_\theta(x) (1 - h_\theta(x)) x^T x]\)

The above equations were implemented using R:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
### Newton's Method
## sigmoid function
g <- function(z) {
    1/(1+exp(-z))
}
 
## hypothesis function
h <- function(theta, x) {
    g(x %*% theta)
}
 
## cost function
J <- function(theta, x, y) {
    m <- length(y)
    s <- sapply(1:m, function(i)
                y[i]*log(h(theta,x[i,])) + (1-y[i])*log(1-h(theta,x[i,]))
                )
    j <- -1/m * sum(s)
    return(j)
}
 
 
## gradient
grad <- function(theta, x, y) {
    m <- length(y)
    g <- 1/m * t(x) %*% (h(theta,x)-y)
    return(g)
}
 
## Hessian
Hessian <- function(theta, x) {
    m <- nrow(x)
    H <- 1/m * t(x) %*% x * diag(h(theta,x)) * diag(1-h(theta,x))
    return(H)
}

The first question need to determine how many iteration until convergence.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
theta <- rep(0, ncol(x))
j <- rep(0,10)
for (i in 1:10) {
    theta <- theta - solve(Hessian(theta,x)) %*% grad(theta,x,y)
    j[i] <- J(theta,x,y)
}
 
ggplot()+
    aes(x=1:10,y=j)+
    geom_point(colour="red")+
    geom_path()+xlab("Iteration")+
    ylab("Cost J")


As illustrated in the above figure, Newton's method converge very fast, only 4-5 iterations was needed.

The second question:What is the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted?

> (1 - g(c(1, 20, 80) %*% theta))* 100
         [,1]
[1,] 64.24722

In our model, we predicted that the probability of the student will not admitted is 64%.

At last, we calculate our classification model based on \( \theta^T x = 0\) , and visualize the fit as below:

?View Code RSPLUS
1
2
3
4
5
x1 <- c(min(x[,2]),max(x[,2]))
x2 <- -1/theta[3,] * (theta[2,]*x1+theta[1,])
a <- (x2[2]-x2[1])/(x1[2]-x1[1])
b <- x2[2]-a*x1[2]
p+geom_abline(slope=a, intercept=b)

Fantastic.

References:
Machine Learning Course
Exercise 4

Related Posts

  1. some other efforts in implementing these exercises were collected in
    http://www.statalgo.com/stanford-machine-learning/, which provides extra information and is highly recommended.

    Reply

  2. Thanks for the mention! Enjoying your posts as well.

    Reply

  3. YGC,thank you for your share. I have a problem, please help me at your convenience. When I use the gradient descent method to solve EX4, it can not converge.My code are as follows:

    x=read.table("E:/DM/text_book/ML_ex/ex4x.dat", header=F)
    y=read.table("E:/DM/text_book/ML_ex/ex4y.dat", header=F)
    x=cbind(rep(1,length(length(y))), x)
    x=as.matrix(x)
    y=y[,1]

    cost_f 1e-10 && iter<50000){
    cost_old=cost_f(beta)
    beta_new=beta-alpha*t(x)%*%(1/(1+exp(-x%*%beta))-y)
    cost_new=cost_f(beta_new)
    alpha1=alpha

    if(cost_new<cost_old) {
    cost=cost_old
    i=i+1
    while(cost_newcost_old){
    cost=cost_new
    alpha=alpha/2
    beta_new=beta-alpha*t(x)%*%(1/(1+exp(-x%*%beta))-y)
    cost_new=cost_f(beta_new)
    while(cost_new<cost){
    cost=cost_new
    alpha=alpha/2
    beta_new=beta-alpha*t(x)%*%(1/(1+exp(-x%*%beta))-y)
    cost_new=cost_f(beta_new)
    j=j+1
    }
    alpha=alpha*2
    }
    beta=beta-alpha*t(x)%*%(1/(1+exp(-x%*%beta))-y)
    cost=cost_f(beta)
    epsion=abs(beta)
    iter=iter+1
    alpha=alpha0
    }
    print(beta)
    print(cost)
    print(iter)
    print(i)
    print(j)

    Reply

    ygc China Unknow Browser Unknow Os Reply:

    you may set a lower alpha value.

    Reply

  4. hi :smile: please any logistic rgerssion implementation using c# :?: thank u

    Reply

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>