| sbeaMethods {EnrichmentBrowser} | R Documentation |
This is the main function for the enrichment analysis of gene sets. It implements and wraps existing implementations of several frequently used methods and allows a flexible inspection of resulting gene set rankings.
sbeaMethods() sbea(method = EnrichmentBrowser::sbeaMethods(), se, gs, alpha = 0.05, perm = 1000, padj.method = "none", out.file = NULL, browse = FALSE, ...)
method |
Set-based enrichment analysis method. Currently, the following set-based enrichment analysis methods are supported: ‘ora’, ‘safe’, ‘gsea’, ‘padog’, ‘roast’, ‘camera’, ‘gsa’, ‘gsva’, ‘globaltest’, ‘samgs’, ‘ebm’, and ‘mgsa’. For basic ora also set 'perm=0'. Default is ‘ora’. This can also be the name of a user-defined function implementing set-based enrichment. See Details. |
se |
Expression dataset. An object of class
Additional optional annotations:
|
gs |
Gene sets. Either a list of gene sets (character vectors of gene IDs) or a text file in GMT format storing all gene sets under investigation. |
alpha |
Statistical significance level. Defaults to 0.05. |
perm |
Number of permutations of the sample group assignments. Defaults to 1000. For basic ora set 'perm=0'. Using method="gsea" and 'perm=0' invokes the permutation approximation from the npGSEA package. |
padj.method |
Method for adjusting nominal gene set p-values to
multiple testing. For available methods see the man page of the stats
function |
out.file |
Optional output file the gene set ranking will be written to. |
browse |
Logical. Should results be displayed in the browser for interactive exploration? Defaults to FALSE. |
... |
Additional arguments passed to individual sbea methods. This includes currently for ORA and MGSA:
|
'ora': overrepresentation analysis, simple and frequently used test based on the hypergeometric distribution (see Goeman and Buhlmann, 2007, for a critical review).
'safe': significance analysis of function and expression, generalization of ORA, includes other test statistics, e.g. Wilcoxon's rank sum, and allows to estimate the significance of gene sets by sample permutation; implemented in the safe package (Barry et al., 2005).
'gsea': gene set enrichment analysis, frequently used and widely accepted, uses a Kolmogorov-Smirnov statistic to test whether the ranks of the p-values of genes in a gene set resemble a uniform distribution (Subramanian et al., 2005).
'padog': pathway analysis with down-weighting of overlapping genes, incorporates gene weights to favor genes appearing in few pathways versus genes that appear in many pathways; implemented in the PADOG package.
'roast': rotation gene set test, uses rotation instead of permutation for assessment of gene set significance; implemented in the limma and edgeR packages for microarray and RNA-seq data, respectively.
'camera': correlation adjusted mean rank gene set test, accounts for inter-gene correlations as implemented in the limma and edgeR packages for microarray and RNA-seq data, respectively.
'gsa': gene set analysis, differs from GSEA by using the maxmean statistic, i.e. the mean of the positive or negative part of gene scores in the gene set; implemented in the GSA package.
'gsva': gene set variation analysis, transforms the data from a gene by sample matrix to a gene set by sample matrix, thereby allowing the evaluation of gene set enrichment for each sample; implemented in the GSVA package.
'globaltest': global testing of groups of genes, general test of groups of genes for association with a response variable; implemented in the globaltest package.
'samgs': significance analysis of microarrays on gene sets, extends the SAM method for single genes to gene set analysis (Dinu et al., 2007).
'ebm': empirical Brown's method, combines $p$-values of genes in a gene set using Brown's method to combine $p$-values from dependent tests; implemented in the EmpiricalBrownsMethod package.
'mgsa': model-based gene set analysis, Bayesian modeling approach taking set overlap into account by working on all sets simultaneously, thereby reducing the number of redundant sets; implemented in the mgsa package.
It is also possible to use additional set-based enrichment methods. This requires to implement a function that takes 'se', 'gs', 'alpha', and 'perm' as arguments and returns a numeric vector 'ps' storing the resulting p-value for each gene set in 'gs'. This vector must be named accordingly (i.e. names(ps) == names(gs)). See examples.
sbeaMethods: a character vector of currently supported methods;
sbea: if(is.null(out.file)): an enrichment analysis result object that can
be detailedly explored by calling eaBrowse and from which a
flat gene set ranking can be extracted by calling gsRanking.
If 'out.file' is given, the ranking is written to the specified file.
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
Goeman and Buhlmann (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics, 23, 980-7.
Barry et al. (2005) Significance Analysis of Function and Expression. Bioinformatics, 21:1943-9.
Subramanian et al. (2005) Gene Set Enrichment Analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102:15545-50.
Dinu et al. (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics, 8:242
Input: readSE, probe2gene
getGenesets to retrieve gene sets from databases such as GO
and KEGG.
Output: gsRanking to retrieve the ranked list of gene sets.
eaBrowse for exploration of resulting gene sets.
Other: nbea to perform network-based enrichment analysis.
combResults to combine results from different methods.
# currently supported methods
sbeaMethods()
# (1) expression data:
# simulated expression values of 100 genes
# in two sample groups of 6 samples each
se <- makeExampleData(what="SE")
se <- deAna(se)
# (2) gene sets:
# draw 10 gene sets with 15-25 genes
gs <- makeExampleData(what="gs", gnames=names(se))
# (3) make 2 artificially enriched sets:
sig.genes <- names(se)[rowData(se)$ADJ.PVAL < 0.1]
gs[[1]] <- sample(sig.genes, length(gs[[1]]))
gs[[2]] <- sample(sig.genes, length(gs[[2]]))
# (4) performing the enrichment analysis
ea.res <- sbea(method="ora", se=se, gs=gs, perm=0)
# (5) result visualization and exploration
gsRanking(ea.res)
# using your own tailored function as enrichment method
dummySBEA <- function(se, gs, alpha, perm)
{
sig.ps <- sample(seq(0, 0.05, length=1000), 5)
nsig.ps <- sample(seq(0.1, 1, length=1000), length(gs)-5)
ps <- sample(c(sig.ps, nsig.ps), length(gs))
names(ps) <- names(gs)
return(ps)
}
ea.res2 <- sbea(method=dummySBEA, se=se, gs=gs)
gsRanking(ea.res2)