Tag Archives: R - Page 8

one-dimensional integrals

The foundamental idea of numerical integration is to estimate the area of the region in the xy-plane bounded by the graph of function f(x). The integral was esimated by divide x to small intervals, then add all the small approximations to give a total approximation.

Numerical integration can be done by trapezoidal rule, simpson's rule and quadrature rules. R has a built-in function integrate, which performs adaptive quadrature.

Trapezoidal rule works by approximating the region under the graph f(x) as a trapezoid and calculating its area.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
trapezoid < - function(fun, a, b, n=100) {
	# numerical integral of fun from a to b
	# using the trapezoid rule with n subdivisions
	# assume a < b and n is a positive integer
	h <- (b-a)/n
	x <- seq(a, b, by=h)
	y <- fun(x)
	s <- h * (y[1]/2 + sum(y[2:n]) + y[n+1]/2)
	return(s)
}

Simpson's rule subdivides the interval [a,b] into n subintervals, where n is even, then on each consecutive pairs of subintervals, it approximates the behaviour of f(x) by a parabola (polynomial of degree 2) rather than by the straight lines used in the trapezoidal rule.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
simpson < - function(fun, a, b, n=100) {
	# numerical integral using Simpson's rule
	# assume a < b and n is an even positive integer
	h <- (b-a)/n
	x <- seq(a, b, by=h)
	if (n == 2) {
		s <- fun(x[1]) + 4*fun(x[2]) +fun(x[3])
	} else {
		s <- fun(x[1]) + fun(x[n+1]) + 2*sum(fun(x[seq(2,n,by=2)])) + 4 *sum(fun(x[seq(3,n-1, by=2)]))
	}
	s <- s*h/3
	return(s)
}

Read more »

Project Euler -- Problem 187

http://projecteuler.net/index.php?section=problems&id=187

A composite is a number containing at least two prime factors. For example, 15 = 3 × 5; 9 = 3 × 3; 12 = 2 × 2 × 3.

There are ten composites below thirty containing precisely two, not necessarily distinct, prime factors: 4, 6, 9, 10, 14, 15, 21, 22, 25, 26.

How many composite integers, n < 10^(8), have precisely two, not necessarily distinct, prime factors?

library(gmp)
bign <- 10^8
n <- 1:floor((bign -1)/2)
p=1:10000
p = p[as.logical(isprime(p))]
for (i in p) {
	n <- n[as.logical(n %% i)]
}
idx <- as.logical(isprime(n))
allp <- c(p,n[idx])
count <- 0
for (i in allp[allp < sqrt(bign)]) {
	count <- count + length(allp[allp < bign/i])
	allp = allp[-1]
}
print(count)

----
[1] 17427258
user system elapsed
449.67 72.40 522.87

Not fast enough...

Root finding

Numerical root finding methods use iteration, producing a sequence of numbers that hopefully converge towards a limits which is a root. In this post, only focus four basic algorithm on root finding, and covers bisection method, fixed point method, Newton-Raphson method, and secant method.

The simplest root finding algorithms is the bisection method. It works when f is a continuous function and it requires previous knowledge of two initial gueeses, u and v, such that f(u) and f(v) have opposite signs. This method is reliable, but converges slowly. For detail, see http://ygc.name/2008/11/25/bisect-to-solve-equation/ .

Root finding can be reduced to the problem of finding fixed points of the function g(x) = c*f(x) +x, where c is a non-zero constant. It is clearly that f(a) = 0 if and only if g(a) = a. This is the so called fixed point algorithm.

fixedpoint <- function(fun, x0, tol=1e-07, niter=500){
	## fixed-point algorithm to find x such that fun(x) == x
	## assume that fun is a function of a single variable
	## x0 is the initial guess at the fixed point
 
	xold <- x0
	xnew <- fun(xold)
	for (i in 1:niter) {
		xold <- xnew
		xnew <- fun(xold)
		if ( abs((xnew-xold)) < tol )
			return(xnew)
		}
	stop("exceeded allowed number of iterations")
}

Read more »

bubble chart by using ggplot2

The visualization represented by Hans Rosling's TED talk was very impressive. FlowingData provides a tutorial on making bubble chart in R. I try to create bubble chart by using ggplot2.

With the dataset provided by FlowingData,The bubble chart was made by the following code.

crime <- read.csv("http://datasets.flowingdata.com/crimeRatesByState2008.csv", header=TRUE, sep="\t")
p <- ggplot(crime, aes(murder,burglary,size=population, label=state))
p <- p+geom_point(colour="red") +scale_area(to=c(1,20))+geom_text(size=3)
p + xlab("Murders per 1,000 population") + ylab("Burglaries per 1,000")

Here is what it looks like.

The avalanche of publications mentioning GO

Gene Ontology is the de facto standard for annotation of gene products. It has been widely used in biological data mining, and I believe it will play more central role in the future.

Publications mentioning GO was collected and deposited in GO ftp, and can be accessed (ftp://ftp.geneontology.org/go/doc/).

I count the number of publicans by year, and draw a histogram, which showed that the growing trend was remarkable.

> gopub <- read.delim("ftp://ftp.geneontology.org/go/doc/biblio-data.txt")
> dim(gopub)
[1] 3626   10
> p <- ggplot(gopub, aes(year))
> p + geom_histogram(aes(y=..count..)) + opts(title = "Publications mentioning GO")