Category Archives: Biology

ggtree in Bioconductor 3.1

I am very glad that ggtree is now available via Bioconductor. This is my 6th Bioconductor package.

ggtree now supports parsing output files from BEAST, PAML, HYPHY, EPA and PPLACER and can annotate phylogenetic tree directly using plot methods.

Now you can use the following command to install ggtree:

?View Code RSPLUS

Find out more at and check out the vignette,

viewing and annotating phylogenetic tree with ggtree

When I need to annotate nucleotide substitutions in the phylogenetic tree, I found that all the software are designed to display the tree but not annotating it. Some of them may support annotating the tree with specific data such as bootstrap values, but they are restricted to a few supported data types. It is hard/impossible to inject user specific data.

Read more »

proper use of GOSemSim

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.


The function of this package is not impressive, and I already knew some related works, including The authors of this webserver contacted me for the usages of GOSemSim when they developing it.

What makes me curious is that the ppiPre package can calculate GO semantic similarity and supports 20 species exactly like GOSemSim. I opened the source tarball, and surprisingly found that its sources related to semantic similarity calculation are totally copied from GOSemSim.

GOSemSim was firstly released in 2008 Bioconductor 2.4 (at that time, devel version) and published in Bioinformatics in 2010. After compared the sources, I found the sources in ppiPre were copied from GOSemSim version 1.6.8 which released in 2010 Bioconductor 2.6.
Read more »


Yearly Topic Trend in PubMed

I found an R script in R-blogger that can be used to track PubMed trend. The script needs Perl package TGen-EUtils to perform query and it is not available now.

It's not difficult to query Pubmed record in R. We can use RCurl package to fetch and use XML package to parse the downloaded record as shown in stackoverflow.

Before I write my own script, I found that there is a well written package, RISmed, that provides many functions to access the NCBI databases.

I write a wrapper function called getPubmedTrend, which import EUtilsSummary and QueryCount from RISmed, to track PubMed trend. Another function called plotPubmedTrend was also implemented for visualizing the trend. These two functions is available in my toy package, yplots.
Read more »

SIR Model of Epidemics

The SIR model divides the population to three compartments: Susceptible, Infected and Recovered. If the disease dynamic fits the SIR model, then the flow of individuals is one direction from the susceptible group to infected group and then to the recovered group. All individuals are assumed to be identical in terms of their susceptibility to infection, infectiousness if infected and mixing behaviour associated with disease transmission.

We defined:

S_t = the number of susceptible individuals at time t

I_t = the number of infected individuals at time t

R_t = the number of recovered individuals at time t

Suppose on average every infected individual will contact \gamma person, and \kappa percent of these \gamma person will be infected. Then on average there are \beta = \gamma \times \kappa person will be infected an infected individual.

So with infected number I_t , they will infected \beta I_t individuals. Since not all people are susceptible, this number should multiple to the percentage of susceptible individuals. Therefore, I_t infected individuals will infect \beta \frac{S_t}{N} I_t individuals.

Another parameter \alpha describes the percentage of infected individuals to recover in a time period. That is on average, it takes 1/\alpha periods for an infected person to recover.
Read more »

Page 1 of 14 1 2 3 4 5 6 7 8 ...Last »