| tximeta {tximeta} | R Documentation |
tximeta leverages the hash signature of the Salmon or Sailfish index,
in addition to a number of core Bioconductor packages (GenomicFeatures,
ensembldb, GenomeInfoDb, BiocFileCache) to automatically populate metadata
for the user, without additional effort from the user. Note that this package
is in "beta" / under development.
tximeta(coldata, type = "salmon", txOut = TRUE, skipMeta = FALSE, cleanDuplicateTxps = FALSE, ...)
coldata |
a data.frame with at least two columns (others will propogate to object):
if |
type |
what quantifier was used (see |
txOut |
whether to output transcript-level data.
|
skipMeta |
whether to skip metadata generation
(e.g. to avoid errors if not connected to internet).
This calls |
cleanDuplicateTxps |
whether to try to clean duplicate transcripts (exact sequence duplicates) by replacing the transcript names that do not appear in the GTF with those that do appear in the GTF |
... |
arguments passed to |
Most of the code in tximeta works to add metadata and transcript ranges
when the quantification was performed with Salmon or Sailfish. However,
tximeta can be used with any quantification type that is supported
by tximport, where it will return an un-ranged SummarizedExperiment.
tximeta checks the hash signature of the index against a database
of known transcriptomes (this database under construction) or a locally stored
linkedTxome (see link{makeLinkedTxome}), and then will
automatically populate, e.g. the transcript locations, the transcriptome release,
the genome with correct chromosome lengths, etc. It allows for automatic
and correct summarization of transcript-level quantifications to the gene-level
via summarizeToGene without the need to manually build
a tx2gene table.
tximeta on the first run will ask where the BiocFileCache for
this package should be kept, either using a default location or a temporary
directory. At any point, the user can specify a location using
setTximetaBFC and this choice will be saved for future sessions.
Multiple users can point to the same BiocFileCache, such that
transcript databases (TxDb) associated with certain Salmon or Sailfish indices
and linkedTxomes can be accessed by different users without additional
effort or time spent downloading/building the relevant TxDb.
In order to allow that multiple users can read and write to the same location, one should set the BiocFileCache directory to have group write permissions (g+w).
a SummarizedExperiment with metadata on the rowRanges.
(if the hash signature in the Salmon or Sailfish index does not match
any known transcriptomes, or any locally saved linkedTxome,
tximeta will just return a non-ranged SummarizedExperiment)
# point to a Salmon quantification file:
dir <- system.file("extdata/salmon_dm", package="tximportData")
files <- file.path(dir, "SRR1197474_cdna", "quant.sf.gz")
coldata <- data.frame(files, names="SRR1197474", condition="A", stringsAsFactors=FALSE)
# normally we would just run the following which would download the appropriate metadata
# se <- tximeta(coldata)
# for this example, we instead point to a local path where the GTF can be found
# by making a linkedTxome:
dir <- system.file("extdata", package="tximeta")
indexDir <- file.path(dir, "Drosophila_melanogaster.BDGP6.cdna.v92_salmon_0.10.2")
fastaFTP <- "ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.cdna.all.fa.gz"
dir2 <- system.file("extdata/salmon_dm", package="tximportData")
gtfPath <- file.path(dir2,"Drosophila_melanogaster.BDGP6.92.gtf.gz")
makeLinkedTxome(indexDir=indexDir, source="Ensembl", organism="Drosophila melanogaster",
release="92", genome="BDGP6", fasta=fastaFTP, gtf=gtfPath, write=FALSE)
se <- tximeta(coldata)
# to clear the entire linkedTxome table
# (don't run unless you want to clear this table!)
# bfcloc <- getTximetaBFC()
# bfc <- BiocFileCache(bfcloc)
# bfcremove(bfc, bfcquery(bfc, "linkedTxomeTbl")$rid)