Tag Archives: R - Page 8

Single variable optimization

Optimization means to seek minima or maxima of a funtion within a given defined domain.

If a function reach its maxima or minima, the derivative at that point is approaching to 0. If we apply Newton-Raphson method for root finding to f', we can get the optimizing f.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
f2df < - function(fun) {
	fun.list = as.list(fun)
	var <- names(fun.list[1])
	fun.exp = fun.list[[2]] 
	diff.fun = D(fun.exp, var) 
	df = list(x=0, diff.fun) 
	df = as.function(df) 
	return(df)
}
 
newton <- function(fun, x0, tol=1e-7, niter=100) { 
	df = f2df(fun) 
	for (i in 1:niter) { 
		x = x0 - fun(x0)/df(x0) 
		if (abs(fun(x)) < tol) 
			return(x) 
		x0 = x      
	} 
	stop("exceeded allowed number of iterations") 
}
 
newton_optimize <- function(fun, x0, tol=1e-7, niter=100) {
	df <- f2df(fun)
	x = newton(df, x0, tol, niter)
	ddf <- f2df(df)
	if (ddf(x) > 0) {
		cat ("minima:\t", x, "\n")
	} else {
		cat ("maxima:\t", x, "\n")
	}
	return(x)
}

Read more »

one-dimensional integrals

The foundamental idea of numerical integration is to estimate the area of the region in the xy-plane bounded by the graph of function f(x). The integral was esimated by divide x to small intervals, then add all the small approximations to give a total approximation.

Numerical integration can be done by trapezoidal rule, simpson's rule and quadrature rules. R has a built-in function integrate, which performs adaptive quadrature.

Trapezoidal rule works by approximating the region under the graph f(x) as a trapezoid and calculating its area.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
trapezoid < - function(fun, a, b, n=100) {
	# numerical integral of fun from a to b
	# using the trapezoid rule with n subdivisions
	# assume a < b and n is a positive integer
	h <- (b-a)/n
	x <- seq(a, b, by=h)
	y <- fun(x)
	s <- h * (y[1]/2 + sum(y[2:n]) + y[n+1]/2)
	return(s)
}

Simpson's rule subdivides the interval [a,b] into n subintervals, where n is even, then on each consecutive pairs of subintervals, it approximates the behaviour of f(x) by a parabola (polynomial of degree 2) rather than by the straight lines used in the trapezoidal rule.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
simpson < - function(fun, a, b, n=100) {
	# numerical integral using Simpson's rule
	# assume a < b and n is an even positive integer
	h <- (b-a)/n
	x <- seq(a, b, by=h)
	if (n == 2) {
		s <- fun(x[1]) + 4*fun(x[2]) +fun(x[3])
	} else {
		s <- fun(x[1]) + fun(x[n+1]) + 2*sum(fun(x[seq(2,n,by=2)])) + 4 *sum(fun(x[seq(3,n-1, by=2)]))
	}
	s <- s*h/3
	return(s)
}

Read more »

Project Euler -- Problem 187

http://projecteuler.net/index.php?section=problems&id=187

A composite is a number containing at least two prime factors. For example, 15 = 3 × 5; 9 = 3 × 3; 12 = 2 × 2 × 3.

There are ten composites below thirty containing precisely two, not necessarily distinct, prime factors: 4, 6, 9, 10, 14, 15, 21, 22, 25, 26.

How many composite integers, n < 10^(8), have precisely two, not necessarily distinct, prime factors?

library(gmp)
bign <- 10^8
n <- 1:floor((bign -1)/2)
p=1:10000
p = p[as.logical(isprime(p))]
for (i in p) {
	n <- n[as.logical(n %% i)]
}
idx <- as.logical(isprime(n))
allp <- c(p,n[idx])
count <- 0
for (i in allp[allp < sqrt(bign)]) {
	count <- count + length(allp[allp < bign/i])
	allp = allp[-1]
}
print(count)

----
[1] 17427258
user system elapsed
449.67 72.40 522.87

Not fast enough...

Root finding

Numerical root finding methods use iteration, producing a sequence of numbers that hopefully converge towards a limits which is a root. In this post, only focus four basic algorithm on root finding, and covers bisection method, fixed point method, Newton-Raphson method, and secant method.

The simplest root finding algorithms is the bisection method. It works when f is a continuous function and it requires previous knowledge of two initial gueeses, u and v, such that f(u) and f(v) have opposite signs. This method is reliable, but converges slowly. For detail, see http://ygc.name/2008/11/25/bisect-to-solve-equation/ .

Root finding can be reduced to the problem of finding fixed points of the function g(x) = c*f(x) +x, where c is a non-zero constant. It is clearly that f(a) = 0 if and only if g(a) = a. This is the so called fixed point algorithm.

fixedpoint <- function(fun, x0, tol=1e-07, niter=500){
	## fixed-point algorithm to find x such that fun(x) == x
	## assume that fun is a function of a single variable
	## x0 is the initial guess at the fixed point
 
	xold <- x0
	xnew <- fun(xold)
	for (i in 1:niter) {
		xold <- xnew
		xnew <- fun(xold)
		if ( abs((xnew-xold)) < tol )
			return(xnew)
		}
	stop("exceeded allowed number of iterations")
}

Read more »

bubble chart by using ggplot2

The visualization represented by Hans Rosling's TED talk was very impressive. FlowingData provides a tutorial on making bubble chart in R. I try to create bubble chart by using ggplot2.

With the dataset provided by FlowingData,The bubble chart was made by the following code.

crime <- read.csv("http://datasets.flowingdata.com/crimeRatesByState2008.csv", header=TRUE, sep="\t")
p <- ggplot(crime, aes(murder,burglary,size=population, label=state))
p <- p+geom_point(colour="red") +scale_area(to=c(1,20))+geom_text(size=3)
p + xlab("Murders per 1,000 population") + ylab("Burglaries per 1,000")

Here is what it looks like.