| generateBedReport {epialleleR} | R Documentation |
'generateBedReport', 'generateAmpliconReport', 'generateCaptureReport' – these functions match BAM reads to the set of genomic locations and return the fraction of reads with an average methylation level passing an arbitrary threshold.
generateAmpliconReport(
bam,
bed,
report.file = NULL,
zero.based.bed = FALSE,
match.tolerance = 1,
threshold.reads = TRUE,
threshold.context = c("CG", "CHG", "CHH", "CxG", "CX"),
min.context.sites = 2,
min.context.beta = 0.5,
max.outofcontext.beta = 0.1,
min.mapq = 0,
min.baseq = 0,
skip.duplicates = FALSE,
nthreads = 0,
gzip = FALSE,
verbose = TRUE
)
generateCaptureReport(
bam,
bed,
report.file = NULL,
zero.based.bed = FALSE,
match.min.overlap = 1,
threshold.reads = TRUE,
threshold.context = c("CG", "CHG", "CHH", "CxG", "CX"),
min.context.sites = 2,
min.context.beta = 0.5,
max.outofcontext.beta = 0.1,
min.mapq = 0,
min.baseq = 0,
skip.duplicates = FALSE,
nthreads = 0,
gzip = FALSE,
verbose = TRUE
)
generateBedReport(
bam,
bed,
report.file = NULL,
zero.based.bed = FALSE,
bed.type = c("amplicon", "capture"),
match.tolerance = 1,
match.min.overlap = 1,
threshold.reads = TRUE,
threshold.context = c("CG", "CHG", "CHH", "CxG", "CX"),
min.context.sites = 2,
min.context.beta = 0.5,
max.outofcontext.beta = 0.1,
min.mapq = 0,
min.baseq = 0,
skip.duplicates = FALSE,
nthreads = 1,
gzip = FALSE,
verbose = TRUE
)
bam |
BAM file location string OR preprocessed output of
|
bed |
Browser Extensible Data (BED) file location string OR object of
class |
report.file |
file location string to write the BED report. If NULL
(the default) then report is returned as a
|
zero.based.bed |
boolean defining if BED coordinates are zero based (default: FALSE). |
match.tolerance |
integer for the largest difference between read's and
BED |
threshold.reads |
boolean defining if sequence reads should be thresholded before counting reads belonging to variant epialleles (default: TRUE). Disabling thresholding is possible but makes no sense in this context as all the reads will be assigned to the variant epiallele, which will result in VEF==1 (in such case 'NA' VEF values are returned in order to avoid confusion). |
threshold.context |
string defining cytosine methylation context used for thresholding the reads:
This option has no effect when read thresholding is disabled. |
min.context.sites |
non-negative integer for minimum number of cytosines within the 'threshold.context' (default: 2). Reads containing fewer within-the-context cytosines are considered completely unmethylated (thus belonging to the reference epiallele). This option has no effect when read thresholding is disabled. |
min.context.beta |
real number in the range [0;1] (default: 0.5). Reads with average beta value for within-the-context cytosines below this threshold are considered completely unmethylated (thus belonging to the reference epiallele). This option has no effect when read thresholding is disabled. |
max.outofcontext.beta |
real number in the range [0;1] (default: 0.1). Reads with average beta value for out-of-context cytosines above this threshold are considered completely unmethylated (thus belonging to the reference epiallele). This option has no effect when read thresholding is disabled. |
min.mapq |
non-negative integer threshold for minimum read mapping quality (default: 0). Option has no effect if preprocessed BAM data was supplied as an input. |
min.baseq |
non-negative integer threshold for minimum nucleotide base quality (default: 0). Option has no effect if preprocessed BAM data was supplied as an input. |
skip.duplicates |
boolean defining if duplicate aligned reads should be skipped (default: FALSE). Option has no effect if preprocessed BAM data was supplied as an input OR duplicate reads were not marked by alignment software. |
nthreads |
non-negative integer for the number of HTSlib threads to be used during BAM file decompression (default: 1). 2 threads make sense for the files larger than 100 MB. Option has no effect if preprocessed BAM data was supplied as an input. |
gzip |
boolean to compress the report (default: FALSE). |
verbose |
boolean to report progress and timings (default: TRUE). |
match.min.overlap |
integer for the smallest overlap between read's and
BED |
bed.type |
character string for the type of assay that was used to produce sequencing reads:
|
Functions report hypermethylated variant epiallele frequencies (VEF) per genomic region of interest using BAM and BED files as input. Reads (for paired-end sequencing alignment files - read pairs as a single entity) are matched to genomic locations by exact coordinates ('generateAmpliconReport' or 'generateBedReport' with an option bed.type="amplicon") or minimum overlap ('generateCaptureReport' or 'generateBedReport' with an option bed.type="capture") – the former to be used for amplicon-based NGS data, while the latter – for the capture-based NGS data. The function's logic is explained below.
Let's suppose we have a BAM file with four reads, all mapped to the "+" strand of chromosome 1, positions 1-16. The genomic range is supplied as a parameter 'bed = as("chr1:1-100", "GRanges")'. Assuming the default values for the thresholding parameters (threshold.reads = TRUE, threshold.context = "CG", min.context.sites = 2, min.context.beta = 0.5, max.outofcontext.beta = 0.1), the input and results will look as following:
| methylation string | threshold | explained |
| ...Z..x+.h..x..h. | below | min.context.sites < 2 (only one zZ base) |
| ...Z..z.h..x..h. | above | pass all criteria |
| ...Z..z.h..X..h. | below | max.outofcontext.beta > 0.1 (1XH / 3xXhH = 0.33) |
| ...Z..z.h..z-.h. | below | min.context.beta < 0.5 (1Z / 3zZ = 0.33) |
Only the second read will satisfy all of the thresholding criteria, leading to the following BED report (given that all reads map to chr1:+:1-16):
| seqnames | start | end | width | strand | nreads+ | nreads- | VEF |
| chr1 | 1 | 100 | 100 | * | 4 | 0 | 0.25 |
data.table object containing VEF report for
BED GRanges or NULL if report.file was
specified. If BAM file contains reads that would not match to any of BED
GRanges, the last row in the report will
contain information on such reads (with seqnames, start and end equal to NA).
The report columns are:
seqnames – reference sequence name
start – start of genomic region
end – end of genomic region
width – width of genomic region
strand – strand
... – other columns that were present in BED or metadata columns of
GRanges object
nreads+ – number of reads (pairs) mapped to the forward ("+") strand
nreads- – number of reads (pairs) mapped to the reverse ("-") strand
VEF – frequency of reads passing the threshold
preprocessBam for preloading BAM data,
generateCytosineReport for methylation statistics at the level
of individual cytosines, generateVcfReport for evaluating
epiallele-SNV associations, generateBedEcdf for analysing the
distribution of per-read beta values, and 'epialleleR' vignettes for the
description of usage and sample data.
GRanges class for working with genomic ranges,
seqlevelsStyle function for getting or setting
the seqlevels style.
# amplicon data
amplicon.bam <- system.file("extdata", "amplicon010meth.bam",
package="epialleleR")
amplicon.bed <- system.file("extdata", "amplicon.bed",
package="epialleleR")
amplicon.report <- generateAmpliconReport(bam=amplicon.bam,
bed=amplicon.bed)
# capture NGS
capture.bam <- system.file("extdata", "capture.bam",
package="epialleleR")
capture.bed <- system.file("extdata", "capture.bed",
package="epialleleR")
capture.report <- generateCaptureReport(bam=capture.bam, bed=capture.bed)
# generateAmpliconReport and generateCaptureReport are just aliases
# of the generateBedReport
bed.report <- generateBedReport(bam=capture.bam, bed=capture.bed,
bed.type="capture")
identical(capture.report, bed.report)