Tag Archives: R

phylomoji with ggtree

If you search the hashtag, #phylomoji, in twitter, you can find many creative phylogenetic trees constructed with emoji.



Now with ggtree, you can play #phylomoji in R.
Read more »

Comparison of clusterProfiler and GSEA-P

Thanks @mevers for raising the issue to me and his efforts in benchmarking clusterProfiler.

He pointed out two issues:

  • outputs from gseGO and GSEA-P are poorly overlap.
  • pvalues from gseGO are generally smaller and don't show a lot of variation

For GSEA analysis, we have two inputs, a ranked gene list and gene set collections.

First of all, the gene set collections are very different. The GMT file used in his test is c5.cc.v5.0.symbols.gmt, which is a tiny subset of GO CC, while clusterProfiler used the whole GO CC corpus.

For instance, with his gene list as input, clusterProfiler annotates 195 genes as ribosome, while GSEA-P (using c5.cc.v5.0.symbols.gmt) only annotates 38 genes.

As the gene set collections is so different, I don't believe the comparison can produce any valuable results.

The first step should be extending clusterProfiler to support using GMT file as gene set annotation, thereafter we can use identical input (both gene list and gene sets) and then benchmarking will be valuable for detecting issues that exclusively attributed to the implementation of GSEA algorithm.
Read more »

use simplify to remove redundancy of enriched GO terms

To simplify enriched GO result, we can use slim version of GO and use enricher function to analyze.

Another strategy is to use GOSemSim to calculate similarity of GO terms and remove those highly similar terms by keeping one representative term. To make this feature available to clusterProfiler users, I develop a simplify method to reduce redundant GO terms from output of enrichGO function.

?View Code RSPLUS
data(geneList, package="DOSE")
de < - names(geneList)[abs(geneList) > 2]
bp < - enrichGO(de, ont="BP")

Read more »

[BioC 3.2] NEWS of my BioC packages

In BioC 3.2 release, all my packages including GOSemSim, clusterProfiler, DOSE, ReactomePA, and ChIPseeker switch from Sweave to R Markdown for package vignettes.


To make it consistent between GOSemSim and clusterProfiler, 'worm' was deprecated and instead we should use 'celegans'. As usual, information content data was updated.


Enrichment results may contains terms that are very general (less informative) and we do not want to use them. In this release, we provide dropGO function that can be used to drop selected GO terms or specific level of GO terms. It can be applied to output from both enrichGO and compareCluster. This is a feature request from @ahorvath.

Another feature request is to visualize GO enrichment result with GO topology. I implement plotGOgraph function by extending topGO to support output of both enrichGO or gseGO.

dotplot is another feature request and was implemented in DOSE as a general function for visualize enrichment result. clusterProfiler import this function.

merge_result function was implemented for merging enrichment results and then the results can be visualized simultaneously for comparison. This function was developed for comparing functional enrichment of GTEx paper. An example of comparing results from clusterProfiler and DAVID can be found in github.

A section 'Functional analysis of NGS data' was added in the vignette. The blog post illustrated using enricher and GSEA function to analyze user defined annotation.
Read more »

ChIPseq data mining with ChIPseeker

ChIP-seq is rapidly becoming a common technique and there are a large number of dataset available in the public domain. Results from individual experiments provide a limited understanding of chromatin interactions, as there is many factors cooperate to regulate transcription. Unlike other tools that designed for single dataset, ChIPseeker is designed for comparing profiles of ChIP-seq datasets at different levels.

We provide functions to compare profiles of peaks binding to TSS regions, annotation, and enriched functional profiles. More importantly, ChIPseeker incorporates statistical testing of co-occurrence of different ChIP-seq datasets and can be used to identify co-factors.

?View Code RSPLUS
> library(ChIPseeker)
> ff=getSampleFiles()
> x = enrichPeakOverlap(ff[[5]], unlist(ff[1:4]), nShuffle=10000, pAdjustMethod="BH", chainFile=NULL)
>> permutation test of peak overlap...		 2015-09-24 14:23:43
  |======================================================================| 100%
> x
ARmo_0M    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
ARmo_1nM   GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
ARmo_100nM GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
CBX6_BF    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
                                                      tSample qLen tLen N_OL
ARmo_0M                       GSM1174480_ARmo_0M_peaks.bed.gz 1663  812    0
ARmo_1nM                     GSM1174481_ARmo_1nM_peaks.bed.gz 1663 2296    8
ARmo_100nM                 GSM1174482_ARmo_100nM_peaks.bed.gz 1663 1359    3
CBX6_BF    GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz 1663 1331  968
               pvalue   p.adjust
ARmo_0M    0.88901110 0.88901110
ARmo_1nM   0.15118488 0.30236976
ARmo_100nM 0.37296270 0.49728360
CBX6_BF    0.00009999 0.00039996

Read more »

Page 1 of 32 1 2 3 4 5 6 7 8 ...Last »