| gsva {GSVA} | R Documentation |
Estimates GSVA enrichment scores.
## S4 method for signature 'ExpressionSet,list'
gsva(expr, gset.idx.list, annotation,
method=c("gsva", "ssgsea", "zscore", "plage"),
kcdf=c("Gaussian", "Poisson", "none"),
abs.ranking=FALSE,
min.sz=1,
max.sz=Inf,
parallel.sz=0,
parallel.type="SOCK",
mx.diff=TRUE,
tau=switch(method, gsva=1, ssgsea=0.25, NA),
ssgsea.norm=TRUE,
verbose=TRUE)
## S4 method for signature 'ExpressionSet,GeneSetCollection'
gsva(expr, gset.idx.list, annotation,
method=c("gsva", "ssgsea", "zscore", "plage"),
kcdf=c("Gaussian", "Poisson", "none"),
abs.ranking=FALSE,
min.sz=1,
max.sz=Inf,
parallel.sz=0,
parallel.type="SOCK",
mx.diff=TRUE,
tau=switch(method, gsva=1, ssgsea=0.25, NA),
ssgsea.norm=TRUE,
verbose=TRUE)
## S4 method for signature 'matrix,GeneSetCollection'
gsva(expr, gset.idx.list, annotation,
method=c("gsva", "ssgsea", "zscore", "plage"),
kcdf=c("Gaussian", "Poisson", "none"),
abs.ranking=FALSE,
min.sz=1,
max.sz=Inf,
parallel.sz=0,
parallel.type="SOCK",
mx.diff=TRUE,
tau=switch(method, gsva=1, ssgsea=0.25, NA),
ssgsea.norm=TRUE,
verbose=TRUE)
## S4 method for signature 'matrix,list'
gsva(expr, gset.idx.list, annotation,
method=c("gsva", "ssgsea", "zscore", "plage"),
kcdf=c("Gaussian", "Poisson", "none"),
abs.ranking=FALSE,
min.sz=1,
max.sz=Inf,
parallel.sz=0,
parallel.type="SOCK",
mx.diff=TRUE,
tau=switch(method, gsva=1, ssgsea=0.25, NA),
ssgsea.norm=TRUE,
verbose=TRUE)
expr |
Gene expression data which can be given either as an |
gset.idx.list |
Gene sets provided either as a |
annotation |
In the case of calling |
method |
Method to employ in the estimation of gene-set enrichment scores per sample. By default
this is set to |
kcdf |
Character string denoting the kernel to use during the non-parametric estimation of the
cumulative distribution function of expression levels across samples when |
abs.ranking |
Flag used only when |
min.sz |
Minimum size of the resulting gene sets. |
max.sz |
Maximum size of the resulting gene sets. |
parallel.sz |
Number of processors to use when doing the calculations in parallel.
This requires to previously load either the |
parallel.type |
Type of cluster architecture when using |
mx.diff |
Offers two approaches to calculate the enrichment statistic (ES)
from the KS random walk statistic. |
tau |
Exponent defining the weight of the tail in the random walk performed by both the |
ssgsea.norm |
Logical, set to |
verbose |
Gives information about each calculation step. Default: |
GSVA assesses the relative enrichment of gene sets across samples using a non-parametric approach. Conceptually, GSVA transforms a p-gene by n-sample gene expression matrix into a g-geneset by n-sample pathway enrichment matrix. This facilitates many forms of statistical analysis in the 'space' of pathways rather than genes, providing a higher level of interpretability.
The gsva() function first maps the identifiers in the gene sets to the
identifiers in the input expression data leading to a filtered collection of
gene sets. This collection can be further filtered to require a minimun and/or
maximum size of the gene sets for which we want to calculate GSVA enrichment
scores, by using the arguments min.sz and max.sz.
A gene-set by sample matrix of GSVA enrichment scores.
J. Guinney and R. Castelo
Barbie, D.A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112, 2009.
Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013.
Lee, E. et al. Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217, 2008.
Tomfohr, J. et al. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225, 2005.
filterGeneSets
computeGeneSetsOverlap
library(limma)
p <- 10 ## number of genes
n <- 30 ## number of samples
nGrp1 <- 15 ## number of samples in group 1
nGrp2 <- n - nGrp1 ## number of samples in group 2
## consider three disjoint gene sets
geneSets <- list(set1=paste("g", 1:3, sep=""),
set2=paste("g", 4:6, sep=""),
set3=paste("g", 7:10, sep=""))
## sample data from a normal distribution with mean 0 and st.dev. 1
y <- matrix(rnorm(n*p), nrow=p, ncol=n,
dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep="")))
## genes in set1 are expressed at higher levels in the last 'nGrp1+1' to 'n' samples
y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2
## build design matrix
design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2)))
## fit linear model
fit <- lmFit(y, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## genes in set1 are differentially expressed
topTable(fit, coef="sampleGroup2vs1")
## estimate GSVA enrichment scores for the three sets
gsva_es <- gsva(y, geneSets, mx.diff=1)
## fit the same linear model now to the GSVA enrichment scores
fit <- lmFit(gsva_es, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## set1 is differentially expressed
topTable(fit, coef="sampleGroup2vs1")