## SIR Model of Epidemics

The SIR model divides the population to three compartments: Susceptible, Infected and Recovered. If the disease dynamic fits the SIR model, then the flow of individuals is one direction from the susceptible group to infected group and then to the recovered group. All individuals are assumed to be identical in terms of their susceptibility to infection, infectiousness if infected and mixing behaviour associated with disease transmission.

We defined:

$S_t$ = the number of susceptible individuals at time t

$I_t$ = the number of infected individuals at time t

$R_t$ = the number of recovered individuals at time t

Suppose on average every infected individual will contact $\gamma$ person, and $\kappa$ percent of these $\gamma$ person will be infected. Then on average there are $\beta = \gamma \times \kappa$ person will be infected an infected individual.

So with infected number $I_t$ , they will infected $\beta I_t$ individuals. Since not all people are susceptible, this number should multiple to the percentage of susceptible individuals. Therefore, $I_t$ infected individuals will infect $\beta \frac{S_t}{N} I_t$ individuals.

Another parameter $\alpha$ describes the percentage of infected individuals to recover in a time period. That is on average, it takes $1/\alpha$ periods for an infected person to recover.

## multiple annotation in ChIPseeker

### Nearest gene annotation

Almost all annotation software calculate the distance of a peak to the nearest TSS and assign the peak to that gene. This can be misleading, as binding sites might be located between two start sites of different genes or hit different genes which have the same TSS location in the genome.

The function annotatePeak provides option to assign genes with a max distance cutoff and all genes within this distance were reported for each peak.

## hierachical clustering with mlass

UPGMA实际上是使用average linkage的层次聚类，说到聚类，不得不吐槽一下，当年某人在暨大要开个生物信息学的课，叫我去讲聚类分析，我准备好了slides，结果还没轮到我去讲，那课已经结束了，哥长这么大，就没遇到这么放鸽子的！于是我准备的Cluster Analysis and its Applications，从此至终没有讲过。

kmeans之前实现过，所以是现成的。对于层次聚类，本来想通过分析hclust函数来讲解，结果发现R里的hclust函数实际上是调用了fortran的代码，这种老古董的语言，反正也是看不懂，于是自己写。

## install 454 GS Data Analysis Software on ubuntu

Usually Roche's installer is a catastrophe, they only provides rpm packages of the software for 454 GS FLX (version 2.9). Although the package contains setup.sh, the script is useless since it is actually a binary payload.

I run the setup.sh, and it throw error of not finding /sbin/lspci. In debian derived distribution, lspci command is located in /bin folder. This issue is easy to solve by adding a soft link to /sbin/lspci.

The second error message popping up says: "Error: Could not execute command: type rocks 2>&1", and I used the command, sudo ln -s /bin/true /bin/rocks, to solve it.

The third error is lack of libraries zlib.i386, libXi.i386, libXtst.i386, and libXaw.i386.
Since my OS is 64bit ubuntu 14.04 LTS, I used, sudo apt-get install ia32-libs, to install all the 32bit compatible libraries.

The fourth error is weird for it can't found /bin/sh which is available for all unix-like systems. Since debian links sh to dash, while most of the Linux distributions links to bash, I changed the link to bash but the error still exists.

I can't figure out how to solve the fourth error and tried to install the rpm packages by using rpm -ivh command but the error doesn't change.

## insertion size

fragment                  ========================================

PE reads      R1--------->                    < ---------R2