| computeFeaturesCage {ORFik} | R Documentation |
Normally dont use this function, but instead use: [computeFeatures()]
computeFeaturesCage(grl, RFP, RNA = NULL, Gtf = NULL, tx = NULL, fiveUTRs = NULL, cds = NULL, threeUTRs = NULL, faFile = NULL, riboStart = 26, riboStop = 34, extension = NULL, orfFeatures = TRUE, cageFiveUTRs = NULL, includeNonVarying = TRUE, grl.is.sorted = FALSE)
grl |
a |
RFP |
ribo seq reads as GAlignment, GRanges or GRangesList object |
RNA |
rna seq reads as GAlignment, GRanges or GRangesList object |
Gtf |
a TxDb object of a gtf file, |
tx |
a GrangesList of transcripts, normally called from: exonsBy(Gtf, by = "tx", use.names = T) only add this if you are not including Gtf file You do not need to reassign these to the cage peaks, it will do it for you. |
fiveUTRs |
fiveUTRs as GRangesList, must be original unchanged fiveUTRs |
cds |
a GRangesList of coding sequences |
threeUTRs |
a GrangesList of transcript 3' utrs, normally called from: threeUTRsByTranscript(Gtf, use.names = T) |
faFile |
a FaFile or BSgenome from the fasta file, see ?FaFile |
riboStart |
usually 26, the start of the floss interval, see ?floss |
riboStop |
usually 34, the end of the floss interval |
extension |
a numeric/integer needs to be set! set to 0 if you did not use cage, if you used cage to change tss' when finding the orfs, standard cage extension is 1000 |
orfFeatures |
a logical, is the grl a list of orfs? Must be assigned. |
cageFiveUTRs |
a GRangesList, if you used cage-data to extend 5' utrs, include this, also extension must match with the extension used for these. |
includeNonVarying |
a logical T, if TRUE, include all features not dependent on Ribo-seq data and RNA-seq data, that is: Kozak, fractionLengths, distORFCDS, isInFrame, isOverlapping and rankInTx |
grl.is.sorted |
logical (F), a speed up if you know argument grl is sorted, set this to TRUE. |
A specialized version if you used Cage data, and don't have a new txdb with reassigned leaders, transcripts and gene starts. If you do have a txdb with cage reassignments, use computeFeatures instead. Each feature have a link to an article describing feature, try ?floss
a data.table with scores, each column is one score type, name of columns are the names of the scores, i.g [floss()] or [fpkm()]
Other features: computeFeatures,
disengagementScore,
distToCds, entropy,
floss, fpkm_calc,
fpkm, fractionLength,
insideOutsideORF, isInFrame,
isOverlapping,
kozakSequenceScore, orfScore,
rankOrder,
ribosomeReleaseScore,
ribosomeStallingScore,
subsetCoverage,
translationalEff
# a small example without cage-seq data:
# we will find ORFs in the 5' utrs
# and then calculate features on them
## Not run:
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {
library(GenomicFeatures)
# Get the gtf txdb file
txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
package = "GenomicFeatures")
txdb <- loadDb(txdbFile)
# Extract sequences of fiveUTRs.
fiveUTRs <- fiveUTRsByTranscript(txdb, use.names = TRUE)[1:10]
faFile <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens
# need to suppress warning because of bug in GenomicFeatures, will
# be fixed soon.
tx_seqs <- suppressWarnings(extractTranscriptSeqs(faFile, fiveUTRs))
# Find all ORFs on those transcripts and get their genomic coordinates
fiveUTR_ORFs <- findMapORFs(fiveUTRs, tx_seqs)
unlistedORFs <- unlistGrl(fiveUTR_ORFs)
# group GRanges by ORFs instead of Transcripts
fiveUTR_ORFs <- groupGRangesBy(unlistedORFs, unlistedORFs$names)
# make some toy ribo seq and rna seq data
starts <- unlistGrl(ORFik:::firstExonPerGroup(fiveUTR_ORFs))
RFP <- promoters(starts, upstream = 0, downstream = 1)
score(RFP) <- rep(29, length(RFP)) # the original read widths
# set RNA seq to duplicate transcripts
RNA <- unlistGrl(exonsBy(txdb, by = "tx", use.names = TRUE))
cageNotUsed <- 0 # used to inform that no cage was used
computeFeaturesCage(grl = fiveUTR_ORFs, orfFeatures = TRUE, RFP = RFP,
RNA = RNA, Gtf = txdb, faFile = faFile, extension = cageNotUsed)
}
# See vignettes for more examples
## End(Not run)