| getGenesets {EnrichmentBrowser} | R Documentation |
Functionality for retrieving gene sets for an organism under investigation from databases such as GO and KEGG. Parsing and writing a list of gene sets from/to a flat text file in GMT format is also supported.
The GMT (Gene Matrix Transposed) file format is a tab delimited file format that describes gene sets. In the GMT format, each row represents a gene set. Each gene set is described by a name, a description, and the genes in the gene set. See references.
getGenesets(org, db = c("go", "kegg"), cache = TRUE,
go.onto = c("BP", "MF", "CC"), go.mode = c("GO.db", "biomart"),
return.type = c("list", "GeneSetCollection"))
writeGMT(gs, gmt.file)
org |
An organism in (KEGG) three letter code, e.g. ‘hsa’ for ‘Homo sapiens’. Alternatively, this can also be a text file storing gene sets in GMT format. See details. |
db |
Database from which gene sets should be retrieved. Currently, either 'go' (default) or 'kegg'. |
cache |
Logical. Should a locally cached version used if available?
Defaults to |
go.onto |
Character. Specifies one of the three GO ontologies: 'BP' (biological process), 'MF' (molecular function), 'CC' (cellular component). Defaults to 'BP'. |
go.mode |
Character. Determines in which way the gene sets are retrieved. This can be either 'GO.db' or 'biomart'. The 'GO.db' mode creates the gene sets based on BioC annotation packages - which is fast, but represents not necessarily the most up-to-date mapping. In addition, this option is only available for the currently supported model organisms in BioC. The 'biomart' mode downloads the mapping from BioMart - which can be time consuming, but allows to select from a larger range of organisms and contains the latest mappings. Defaults to 'GO.db'. |
return.type |
Character. Determines whether gene sets are returned
as a simple list of gene sets (each being a character vector of gene IDs), or
as an object of class |
gs |
A list of gene sets (character vectors of gene IDs). |
gmt.file |
Gene set file in GMT format. See details. |
For getGenesets: a list of gene sets (vectors of gene IDs).
For writeGMT: none, writes to file.
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
KEGG Organism code http://www.genome.jp/kegg/catalog/org_list.html
GMT file format http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
annFUN for general GO2gene mapping used in the
'GO.db' mode, and the biomaRt package for general queries to BioMart.
keggList and keggLink for accessing the KEGG REST
server.
# (1) Typical usage for gene set enrichment analysis with GO:
# Biological process terms based on BioC annotation (for human)
go.gs <- getGenesets(org="hsa", db="go")
# eq.:
# go.gs <- getGenesets(org="hsa", db="go", go.onto="BP", go.mode="GO.db")
# Alternatively:
# downloading from BioMart
# this may take a few minutes ...
go.gs <- getGenesets(org="hsa", db="go", mode="biomart")
# (2) Defining gene sets according to KEGG
kegg.gs <- getGenesets(org="hsa", db="kegg")
# (3) parsing gene sets from GMT
gmt.file <- system.file("extdata/hsa_kegg_gs.gmt", package="EnrichmentBrowser")
gs <- getGenesets(gmt.file)
# (4) writing gene sets to file
writeGMT(gs, gmt.file)