If you search the hashtag, #phylomoji, in twitter, you can find many creative phylogenetic trees constructed with emoji.
Now with ggtree, you can play #phylomoji in R.
Read more »
Thanks @mevers for raising the issue to me and his efforts in benchmarking clusterProfiler.
He pointed out two issues:
- outputs from gseGO and GSEA-P are poorly overlap.
- pvalues from gseGO are generally smaller and don't show a lot of variation
For GSEA analysis, we have two inputs, a ranked gene list and gene set collections.
First of all, the gene set collections are very different. The GMT file used in his test is c5.cc.v5.0.symbols.gmt, which is a tiny subset of GO CC, while clusterProfiler used the whole GO CC corpus.
For instance, with his gene list as input, clusterProfiler annotates 195 genes as ribosome, while GSEA-P (using c5.cc.v5.0.symbols.gmt) only annotates 38 genes.
As the gene set collections is so different, I don't believe the comparison can produce any valuable results.
The first step should be extending clusterProfiler to support using GMT file as gene set annotation, thereafter we can use identical input (both gene list and gene sets) and then benchmarking will be valuable for detecting issues that exclusively attributed to the implementation of GSEA algorithm.
Read more »
To simplify enriched GO result, we can use slim version of GO and use enricher function to analyze.
Another strategy is to use GOSemSim to calculate similarity of GO terms and remove those highly similar terms by keeping one representative term. To make this feature available to clusterProfiler users, I develop a simplify method to reduce redundant GO terms from output of enrichGO function.
de < - names(geneList)[abs(geneList) > 2]
bp < - enrichGO(de, ont="BP")
Read more »
In BioC 3.2 release, all my packages including GOSemSim, clusterProfiler, DOSE, ReactomePA, and ChIPseeker switch from Sweave to R Markdown for package vignettes.
To make it consistent between GOSemSim and clusterProfiler, 'worm' was deprecated and instead we should use 'celegans'. As usual, information content data was updated.
Enrichment results may contains terms that are very general (less informative) and we do not want to use them. In this release, we provide dropGO function that can be used to drop selected GO terms or specific level of GO terms. It can be applied to output from both enrichGO and compareCluster. This is a feature request from @ahorvath.
Another feature request is to visualize GO enrichment result with GO topology. I implement plotGOgraph function by extending topGO to support output of both enrichGO or gseGO.
dotplot is another feature request and was implemented in DOSE as a general function for visualize enrichment result. clusterProfiler import this function.
merge_result function was implemented for merging enrichment results and then the results can be visualized simultaneously for comparison. This function was developed for comparing functional enrichment of GTEx paper. An example of comparing results from clusterProfiler and DAVID can be found in github.
A section 'Functional analysis of NGS data' was added in the vignette. The blog post illustrated using enricher and GSEA function to analyze user defined annotation.
Read more »
ChIP-seq is rapidly becoming a common technique and there are a large number of dataset available in the public domain. Results from individual experiments provide a limited understanding of chromatin interactions, as there is many factors cooperate to regulate transcription. Unlike other tools that designed for single dataset, ChIPseeker is designed for comparing profiles of ChIP-seq datasets at different levels.
We provide functions to compare profiles of peaks binding to TSS regions, annotation, and enriched functional profiles. More importantly, ChIPseeker incorporates statistical testing of co-occurrence of different ChIP-seq datasets and can be used to identify co-factors.
> x = enrichPeakOverlap(ff[], unlist(ff[1:4]), nShuffle=10000, pAdjustMethod="BH", chainFile=NULL)
>> permutation test of peak overlap... 2015-09-24 14:23:43
tSample qLen tLen N_OL
ARmo_0M GSM1174480_ARmo_0M_peaks.bed.gz 1663 812 0
ARmo_1nM GSM1174481_ARmo_1nM_peaks.bed.gz 1663 2296 8
ARmo_100nM GSM1174482_ARmo_100nM_peaks.bed.gz 1663 1359 3
CBX6_BF GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz 1663 1331 968
ARmo_0M 0.88901110 0.88901110
ARmo_1nM 0.15118488 0.30236976
ARmo_100nM 0.37296270 0.49728360
CBX6_BF 0.00009999 0.00039996
Read more »