| mergePairsByID {dada2} | R Documentation |
This function attempts to merge each pair of denoised forward and reverse reads,
rejecting any which do not sufficiently overlap
or which contain too many (>0 by default) mismatches in the overlap region.
Note: This function does not assume that the fastq files
for the forward and reverse reads were in the same order.
If they are already in the same order, use mergePairs.
mergePairsByID(dadaF, derepF, srF, dadaR, derepR, srR, minOverlap = 20,
maxMismatch = 0, returnRejects = FALSE, idRegExpr = c("\\s.+$", ""),
includeCol = character(0), justConcatenate = FALSE, verbose = FALSE)
dadaF |
(Required). A |
derepF |
(Required). A |
srF |
(Required). The trimmed and filtered forward reads
that you used as input for |
dadaR |
(Required). A |
derepR |
(Required). A |
srR |
(Required). See srF description, but in this case provide for the reverse reads. |
minOverlap |
(Optional). A |
maxMismatch |
(Optional). A |
returnRejects |
(Optional).
A |
idRegExpr |
(Optional).
A length 2 |
includeCol |
(Optional). |
justConcatenate |
(Optional).
NOT CURRENTLY SUPPORTED.
|
verbose |
(Optional). |
Not yet implemented: Use of the concatenate option will result in concatenating forward and reverse reads without attempting a merge/alignment step.
A data.frame with a row for each unique pairing of forward/reverse denoised sequences,
and the following columns:
$abundance: Number of reads corresponding to this forward/reverse combination.
$sequence: The merged sequence.
$forward: The index of the forward denoised sequence.
$reverse: The index of the reverse denoised sequence.
$nmatch: Number of matches nts in the overlap region.
$nmismatch: Number of mismatches in the overlap region.
$nindel: Number of indels in the overlap region.
$prefer: The sequence used for the overlap region. 1=forward; 2=reverse.
$accept: TRUE if overlap between forward and reverse denoised sequences was at least
minOverlap and had at most maxMismatch differences. FALSE otherwise.
$...: Additional columns specified in propagateCol.
# For the following example files, there are two ways to merge denoised directions.
# Because the read sequences are in order, `mergePairs()` works.
# `mergePairsByID` always works,
# because it uses the read IDs to match denoised pairs.
exFileF = system.file("extdata", "sam1F.fastq.gz", package="dada2")
exFileR = system.file("extdata", "sam1R.fastq.gz", package="dada2")
srF = ShortRead::readFastq(exFileF)
srR = ShortRead::readFastq(exFileR)
derepF = derepFastq(exFileF)
derepR = derepFastq(exFileR)
dadaF <- dada(derepF, err=tperr1, errorEstimationFunction=loessErrfun, selfConsist=TRUE)
dadaR <- dada(derepR, err=tperr1, errorEstimationFunction=loessErrfun, selfConsist=TRUE)
# Run and compare
ex1time = system.time({
ex1 <- mergePairs(dadaF, derepF, dadaR, derepR, verbose = TRUE)
ex1 <- data.table::data.table(ex1)
})
ex1time
# The new function, based on read IDs.
ex2time = system.time({
ex2 = dada2:::mergePairsByID(dadaF = dadaF, derepF = derepF, srF = srF,
dadaR = dadaR, derepR = derepR, srR = srR, verbose = TRUE)
})
ex2time
# Compare results (should be identical)
ex2[(accept)]
data.table::setkey(ex2, sequence)
ex2[(accept), list(abundance = sum(abundance)), by = sequence]
# Same sequence set (exactly)
setequal(x = ex1$sequence,
y = ex2[(accept)]$sequence)
# Test concatenation functionality
ex1cattime = system.time({
ex1cat <- mergePairs(dadaF, derepF, dadaR, derepR, justConcatenate = TRUE, verbose = TRUE)
sapply(ex1cat, class)
# need to convert to a character
ex1cat$sequence <- unlist(ex1cat$sequence)
ex1cat <- data.table::data.table(ex1cat)
})
ex1cattime
ex2cattime = system.time({
ex2cat <- dada2:::mergePairsByID(dadaF = dadaF, derepF = derepF, srF = srF,
dadaR = dadaR, derepR = derepR, srR = srR,
justConcatenate = TRUE, verbose = TRUE)
})
ex2cattime
ex2cat[(accept)]
# Compare results (should be identical)
data.table::setkey(ex1cat, sequence)
ex1cat[(accept), list(abundance = sum(abundance)), by = sequence]
data.table::setkey(ex2cat, sequence)
ex2cat[(accept), list(abundance = sum(abundance)), by = sequence]
# Same sequence set (exactly)
setequal(x = ex1cat$sequence,
y = ex2cat$sequence)
intersect(x = ex1cat$sequence,
y = ex2cat$sequence)
ex1cat[, nchar(sequence)]
ex2cat[, nchar(sequence)]