ChIPseeker for ChIP peak annotation

ChIPpeakAnno WAS the only R package for ChIP peak annotation. I used it for annotating peak in my recent study.

I found it does not consider the strand information of genes. I reported the bug to the authors, but they are reluctant to change.

So I decided to develop my own package, ChIPseeker, and it's now available in Bioconductor.
Read more »

boxplot

生物坑很多人画图只会直方图,统计只会T检验,在暨大见过太多的学生连T检验都不会,分不清SEM和SD的差别,也不清楚T检验那几个简单参数的含义。我写统计笔记也是因为不想重复性地跟学生讲解T检验。

Barplot和T test一样普遍而流行,barplot适合于表示计数数据和比例,显示比例也可以用pie plot,但直方图比饼图要好,因为人类的眼睛适合于比较高度,而不是弧度。

多半时候生物学数据并非简单的计数数据,对于测量数据,在展示数据分布时,很多人会使用他们熟悉的barplot,用高度来表示mean,然后再加上errorbar,这样展示数据,信息量是非常低的,使用boxplot能够提供更多的数据分布信息,能更好地展现数据,但可能很多人只会在excel里画barplot,Nature Methods 2013年的文章中有100个barplot图,而只有20个boxplot图,从这里就可以看出来,用boxplot的人远远没有barplot多,于是NPG怒了,写了两篇专栏文章Points of View: Bar charts and box plotsPoints of Significance: Visualizing samples with box plots并且发表了一篇BoxPlotR: a web tool for generation of box plots方便大家画boxplot,如此简单的web tool能够发Nature Methods,实在是让人羡慕妒忌恨啊。
Read more »

old habits die hard

Screenshot 2014-01-23 01.10.07
从2011年1月我就在实验室的QQ群里发群邮件说IPI关门,时至今日,已经关门3年了,主页上一直停留在关门大吉的那一刻。
Screenshot 2014-01-23 00.03.17

我不断在邮件里, lab meeting上强调要换成uniprot来搜库,然而时至今日,依然还是有很多的人在使用IPI,想想真可怕,实验室真是100年不更新一下数据啊。
Read more »

Bug of R package ChIPpeakAnno

I used R package ChIPpeakAnno for annotating peaks, and found that it handle the DNA strand in the wrong way. Maybe the developers were from the computer science but not biology background.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
> require(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] '2.10.0'
> peak < - RangedData(space="chr1", IRanges(24736757, 24737528))
> data(TSS.human.GRCh37)
> ap < - annotatePeakInBatch(peak, Annotation=TSS.human.GRCh37)
> ap
RangedData with 1 row and 9 value columns across 1 space
                     space               ranges |        peak      strand
                  <factor>            <iranges> | <character> </character><character>
1 ENSG00000001461        1 [24736757, 24737528] |           1           +
                          feature start_position end_position insideFeature
                      </character><character>      <numeric>    </numeric><numeric>   <character>
1 ENSG00000001461 ENSG00000001461       24742284     24799466      upstream
                  distancetoFeature shortestDistance fromOverlappingOrNearest
                          <numeric>        </numeric><numeric>              <character>
1 ENSG00000001461             -5527             4756             NearestStart
</character></numeric></character></numeric></character></iranges></factor>

In this example, I defined a peak ranging from chr1:24736757 to chr1:24737528 and annotated the peak using ChIPpeakAnno package.

It returns that the nearest gene is ENSG00000001461, whose gene symbol is NIPAL3.

?View Code RSPLUS
1
2
3
4
5
> require(org.Hs.eg.db)
> gene.ChIPpeakAnno < - select(org.Hs.eg.db, key=ap$feature, keytype="ENSEMBL", columns=c("ENSEMBL", "ENTREZID", "SYMBOL"))
> gene.ChIPpeakAnno
          ENSEMBL ENTREZID SYMBOL
1 ENSG00000001461    57185 NIPAL3

When looking at the peak in Genome Browser, I found the nearest gene is STPG1.
Screenshot 2014-01-13 22.00.46
Read more »

local blast

I was asked to set up a local blast for the lab. Blast can be installed directly using apt in debian and it turns out to be easy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
root@jz:/ssd/genomes# apt-get install ncbi-blast+
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  ncbi-blast+
0 upgraded, 1 newly installed, 0 to remove and 26 not upgraded.
Need to get 11.2 MB of archives.
After this operation, 32.8 MB of additional disk space will be used.
Get:1 http://ftp.hk.debian.org/debian/ wheezy/main ncbi-blast+ amd64 2.2.26-3 [11.2 MB]
Fetched 11.2 MB in 1min 16s (146 kB/s)
Selecting previously unselected package ncbi-blast+.
(Reading database ... 252681 files and directories currently installed.)
Unpacking ncbi-blast+ (from .../ncbi-blast+_2.2.26-3_amd64.deb) ...
Processing triggers for man-db ...
Setting up ncbi-blast+ (2.2.26-3) ...

Before the program can be used for sequence alignment, we should prepare the db files:

1
2
3
4
5
6
7
8
9
root@jz:/ssd/genomes/blast/db# makeblastdb -in ../../hg19.fa -out hg19 -dbtype nucl
Building a new DB, current time: 11/21/2013 16:03:05
New DB name:   hg19
New DB title:  ../../hg19.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 25 sequences in 27.7084 seconds.

That's it. Now blast is supported in the lab server.
Read more »