# Tag Archives: R

## SIR Model of Epidemics

The SIR model divides the population to three compartments: Susceptible, Infected and Recovered. If the disease dynamic fits the SIR model, then the flow of individuals is one direction from the susceptible group to infected group and then to the recovered group. All individuals are assumed to be identical in terms of their susceptibility to infection, infectiousness if infected and mixing behaviour associated with disease transmission.

We defined:

$S_t$ = the number of susceptible individuals at time t

$I_t$ = the number of infected individuals at time t

$R_t$ = the number of recovered individuals at time t

Suppose on average every infected individual will contact $\gamma$ person, and $\kappa$ percent of these $\gamma$ person will be infected. Then on average there are $\beta = \gamma \times \kappa$ person will be infected an infected individual.

So with infected number $I_t$ , they will infected $\beta I_t$ individuals. Since not all people are susceptible, this number should multiple to the percentage of susceptible individuals. Therefore, $I_t$ infected individuals will infect $\beta \frac{S_t}{N} I_t$ individuals.

Another parameter $\alpha$ describes the percentage of infected individuals to recover in a time period. That is on average, it takes $1/\alpha$ periods for an infected person to recover.

m4s0n501

## multiple annotation in ChIPseeker

### Nearest gene annotation

Almost all annotation software calculate the distance of a peak to the nearest TSS and assign the peak to that gene. This can be misleading, as binding sites might be located between two start sites of different genes or hit different genes which have the same TSS location in the genome.

The function annotatePeak provides option to assign genes with a max distance cutoff and all genes within this distance were reported for each peak.

## why clusterProfiler fails

Recently, there are some comments said that sometimes clusterProfiler failed in KEGG enrichment analysis.

kaji331 compared cluserProfiler with GeneAnswers and found that clusterProfiler gives larger p values. The result forces me to test it.

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  require(GeneAnswers) data('humanGeneInput') y < - geneAnswersBuilder(humanGeneInput, 'org.Hs.eg.db', categoryType='KEGG', testType='hyperG', pvalueT=0.1, geneExpressionProfile=humanExpr, verbose=FALSE) yy <- y@enrichmentInfo   require(clusterProfiler) x <- enrichKEGG(humanGeneInput$GeneID, pvalueCutoff=0.2, qvalueCutoff=0.2, minGSSize=1) xx <- summary(x) id <- sub("hsa", "", xx$ID) idx <- id %in% rownames(yy)   p.clusterProfiler <- xx$pvalue[idx] p.GeneAnswers <- yy[id[idx],]$"p value" > cor(p.clusterProfiler, p.GeneAnswers) [1] 0.9996165 > p.clusterProfiler - p.GeneAnswers [1] 1.029789e-04 -3.588252e-05 -4.623010e-05 1.079117e-04 -1.075746e-04 [6] -1.077398e-04 -3.774637e-04 -2.849278e-04 -4.197993e-04 7.588155e-04 [11] -3.702141e-04 2.314721e-03 -5.695641e-04 -5.940830e-04 -4.923697e-04 [16] -5.560738e-04 -5.884079e-04 2.011138e-03

Here, I used the dataset, humanGeneInput, provided by GeneAnswers. There are 19 pathways have p values below 0.1 by GeneAnswers and 18 pathways have p values below 0.1 by clusterProfiler. 18 of them are the same and p values are highly correlated with very small differences.

## enrichment map

In PLOB's QQ group, someone asked how to change the color of enrichment map in Cytoscape. I am very curious how enrichment map can helps to interpret enrichment results. It took me 2 hours to implement it using R and I am surprised that the enrichment map is better than anticipated.

Now in the development version of clusterProfiler, DOSE, and ReactomePA, you can use enrichmap function to generate the enrichment map of enrichment results obtained by hypergeometric test or gene set enrichment analysis.

## visualization methods in ChIPseeker

After two weeks developed, I have added/updated some plot functions in ChIPseeker (version >=1.0.1).

### ChIP peaks over Chromosomes

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  > files=getSampleFiles() > peak=readPeakFile(files[[4]]) > peak GRanges object with 1331 ranges and 2 metadata columns: seqnames ranges strand | V4 V5 | [1] chr1 [ 815092, 817883] * | MACS_peak_1 295.76 [2] chr1 [1243287, 1244338] * | MACS_peak_2 63.19 [3] chr1 [2979976, 2981228] * | MACS_peak_3 100.16 [4] chr1 [3566181, 3567876] * | MACS_peak_4 558.89 [5] chr1 [3816545, 3818111] * | MACS_peak_5 57.57 ... ... ... ... ... ... ... [1327] chrX [135244782, 135245821] * | MACS_peak_1327 55.54 [1328] chrX [139171963, 139173506] * | MACS_peak_1328 270.19 [1329] chrX [139583953, 139586126] * | MACS_peak_1329 918.73 [1330] chrX [139592001, 139593238] * | MACS_peak_1330 210.88 [1331] chrY [ 13845133, 13845777] * | MACS_peak_1331 58.39 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA > covplot(peak, weightCol="V5")