proper use of GOSemSim

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.


The function of this package is not impressive, and I already knew some related works, including The authors of this webserver contacted me for the usages of GOSemSim when they developing it.

What makes me curious is that the ppiPre package can calculate GO semantic similarity and supports 20 species exactly like GOSemSim. I opened the source tarball, and surprisingly found that its sources related to semantic similarity calculation are totally copied from GOSemSim.

GOSemSim was firstly released in 2008 Bioconductor 2.4 (at that time, devel version) and published in Bioinformatics in 2010. After compared the sources, I found the sources in ppiPre were copied from GOSemSim version 1.6.8 which released in 2010 Bioconductor 2.6.
Read more »

hello yosemite

Screenshot 2014-11-11 14.07.45

Installing OS is painful, you need to re-install all the software and configure them to the way you want. We don't want to wast time in doing this. To prevent doing this dirty job, we perform an upgrade install instead of clean install. We all have experience of upgrading Windows sucks, same as OS X. All the issues you have in old system will be remained, and sometimes new issues will be introduced in the process of upgrading. The system will be slower compare to clean install one.

Life is short, we want a clean system without wasting time in setting software, but how can it be. The answer is yes and no. Firstly, you should have two partitions, one for OS and the other one for your data. Only in this way, you can formatting the system partition and keep your data untouch.

When I was an undergraduate student, I install both FreeBSD and debian in my PC. They were configured to share the same home partition, so that I don't have to configure software in both of the systems. In OS X, the home directory is not located at traditional path /home/userName, but at /Users/userName. You should change your home directory from system partition to your data partition.
Read more »

Yearly Topic Trend in PubMed

I found an R script in R-blogger that can be used to track PubMed trend. The script needs Perl package TGen-EUtils to perform query and it is not available now.

It's not difficult to query Pubmed record in R. We can use RCurl package to fetch and use XML package to parse the downloaded record as shown in stackoverflow.

Before I write my own script, I found that there is a well written package, RISmed, that provides many functions to access the NCBI databases.

I write a wrapper function called getPubmedTrend, which import EUtilsSummary and QueryCount from RISmed, to track PubMed trend. Another function called plotPubmedTrend was also implemented for visualizing the trend. These two functions is available in my toy package, yplots.
Read more »

SIR Model of Epidemics

The SIR model divides the population to three compartments: Susceptible, Infected and Recovered. If the disease dynamic fits the SIR model, then the flow of individuals is one direction from the susceptible group to infected group and then to the recovered group. All individuals are assumed to be identical in terms of their susceptibility to infection, infectiousness if infected and mixing behaviour associated with disease transmission.

We defined:

\(S_t\) = the number of susceptible individuals at time t

\(I_t\) = the number of infected individuals at time t

\(R_t\) = the number of recovered individuals at time t

Suppose on average every infected individual will contact \(\gamma\) person, and \(\kappa\) percent of these \(\gamma\) person will be infected. Then on average there are \(\beta = \gamma \times \kappa\) person will be infected an infected individual.

So with infected number \(I_t\) , they will infected \(\beta I_t\) individuals. Since not all people are susceptible, this number should multiple to the percentage of susceptible individuals. Therefore, \(I_t\) infected individuals will infect \(\beta \frac{S_t}{N} I_t\) individuals.

Another parameter \(\alpha\) describes the percentage of infected individuals to recover in a time period. That is on average, it takes \(1/\alpha\) periods for an infected person to recover.
Read more »

multiple annotation in ChIPseeker

Nearest gene annotation

Almost all annotation software calculate the distance of a peak to the nearest TSS and assign the peak to that gene. This can be misleading, as binding sites might be located between two start sites of different genes or hit different genes which have the same TSS location in the genome.

The function annotatePeak provides option to assign genes with a max distance cutoff and all genes within this distance were reported for each peak.

Read more »