| getClusters {wavClusteR} | R Documentation |
Identifies clusters using either the mini-rank norm (MRN) algorithm (default
and recommended to achieve highest sensitivity) or via a continuous wavelet
transform (CWT) based approach. The former employs thresholding of
background coverage differences and finds the optimal cluster boundaries by
exhaustively evaluating all putative clusters using a rank-based approach.
This method has higher sensitivity and an approximately 10-fold faster
running time than the CWT-based cluster identification algorithm. The
latter, maintained for compatibility with wavClusteR, computes the
CWT on a 1 kb window of the coverage function centered at a high-confidence
substitution site, and identifies cluster boundaries by extending away from
peak positions.
getClusters(highConfSub, coverage, sortedBam, method = 'mrn', cores = 1, threshold, step = 1, snr = 3)
highConfSub |
GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function |
coverage |
An Rle object containing the coverage at each genomic position as returned by a call to coverage |
sortedBam |
a GRanges object containing all aligned reads, including read sequence (qseq) and MD tag (MD), as returned by the readSortedBam function |
method |
a character, either set to "mrn" or to "cwt" to compute clusters using the mini-rank norm or the wavelet transform-based algorithm, respectively. Default is "mrn" (recommended). |
cores |
integer, the number of cores to be used for parallel evaluation. Default is 1. |
threshold |
numeric, if |
step |
numeric, if |
snr |
numeric, if |
GRanges object containing the identified cluster boundaries.
Clusters returned by this function need to be further merged by the
function filterClusters, which also computes all relevant cluster
statistics.
Federico Comoglio and Cem Sievers
William Constantine and Donald Percival (2011), wmtsa: Wavelet Methods for Time Series Analysis, http://CRAN.R-project.org/package=wmtsa
Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697
Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.
getHighConfSub, filterClusters
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub,
coverage = coverage,
sortedBam = example,
method = 'mrn',
cores = 1,
threshold = 2 )