Tag Archives: R

clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters

Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html

Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. “clusterProfiler: An R Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16, no. 5 (May 2012): 284–287.

draw chinese character Jiong using ggplot2

The Chinese character Jiong (囧) is now widely used for expressing ideas or feelings such as annoyance, shock, embarrassment, awkwardness, scorn.

The function plot of y=1/(x^2-1) looks very similar of this symbol.

I use ggplot2 to draw the symbol of Jiong.

It looks like:

The function line is very easy to draw, but the options to hone this graph need some tricky tips.

The source code for generating this plot can be found in github.

ML-Class Ex 7 – kMeans clustering

The K-means algorithm is a method to automatically cluster similar data examples together.

The intuition behind K-means is an iterative procedure that starts by guessing the initial centroids, and then refines this guess by repeatedly assigning examples to their closest centroids and then recomputing the centroids based on the assignments.

This algorithm was implemented as follows:
Continue reading “ML-Class Ex 7 – kMeans clustering” »

project euler – problem 49

The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another.

There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes, exhibiting this property, but there is one other 4-digit increasing sequence.

What 12-digit number do you form by concatenating the three terms in this sequence?
?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
n <- 10^4:10^3
prime <- n[gmp::isprime(n) != 0]
pl <- lapply(prime,function(i) unlist(strsplit(as.character(i), split="")))
 
flag <- 0
for (i in seq_along(pl)) {
    x <- pl[[i]]
    maxP <- prime[i]
 
    pl <- pl[-i]
    prime <- prime[-i]
 
    idx <- unlist(lapply(pl, function(i) all(x %in% i) & all(i %in% x)))
    idx <- which(idx)
    if (length(idx) >= 2) {
        sel <- prime[idx]
        diff <- maxP-sel
        for (j in 1:length(diff)) {
            m <- which(diff == 2* diff[j])
            if (length(m) >= 1) {
                minP <- sel[m[length(m)]]
                midP <- sel[j]
                flag <- 1
                break
            }
        }
    }
    if(flag) {
        break
    }
}
 
ans <- paste(c(minP, midP, maxP), collapse="")
print(ans)
> system.time(source("problem49.R")) system.time(source("problem49.R"))
[1] "296962999629"
   user  system elapsed
   0.25    0.00    0.25

project euler – problem 47

The first two consecutive numbers to have two distinct prime factors are:

14 = 2 × 7
15 = 3 × 5

The first three consecutive numbers to have three distinct prime factors are:

644 = 2² × 7 × 23
645 = 3 × 5 × 43
646 = 2 × 17 × 19.

Find the first four consecutive integers to have four distinct primes factors. What is the first of these numbers?
getFactor <- function(n) {
    f <- c()
    for ( i in 2:ceiling(sqrt(n/2)))  {
        if (n %%i ==0) {
            n <- n/i
            while(n %% i ==0) {
                n <- n/i
            }
            f <- c(f,i)
            if (gmp::isprime(n) !=0) {
                f <- c(f,n)
            }
        }
    }
    return(unique(f))
}

i <- 4
n <- 10^(i-1)

while(TRUE) {
    flag <- 0
    for (j in 0:(i-1)) {
        f <- getFactor(n+j)
        if(length(f) != i)
            break
        if(any(gmp::isprime(f) == 0))
            break
        if (j==i-1)
            flag <- 1
    }
    if (j == i-1 && flag==1) {
        print(n)
        break
    }
    n <- n+j+1
}

when i = 2, the program will print 14, and when i = 3, it will print 644.
This program is not hard coded, i can be set to any number to find the number that satisfy the property of problem 47 wanted.

> system.time(source("Problem47.R"))
[1] 134043
   user  system elapsed
  43.22    0.00   43.28