| readAligned {ShortRead} | R Documentation |
readAligned reads all aligned read files in a directory
dirPath whose file name matches pattern,
returning a compact internal representation of the alignments,
sequences, and quality scores in the files. Methods read all files into a
single R object; a typical use is to restrict input to a single
aligned read file.
readAligned(dirPath, pattern=character(0), ...)
dirPath |
A character vector (or other object; see methods defined on this generic) giving the directory path (relative or absolute) of aligned read files to be input. |
pattern |
The (grep-style) pattern describing file
names to be read. The default (character(0)) results in
(attempted) input of all files in the directory. |
... |
Additional arguments, used by methods. Most methods
implement filter=srFilter(), allowing objects of
SRFilter to selectively returns aligned reads. |
There is no standard aligned read file format; methods parse particular file types.
The readAligned,character-method interprets file types based
on an additional type argument. Supported types are:
type="SolexaExport"
This type parses .*_export.txt files following the
documentation in the Solexa Genome Alignment software manual,
version 0.3.0. These files consist of the following columns;
consult Solexa documentation for precise descriptions. If parsed,
values can be retrieved from AlignedRead as
follows:
alignDataalignDataalignDataalignDataalignDatasreadqualitychromosomepositionstrandalignQualityalignData
Paired read columns are not interpreted. The resulting
AlignedRead object does not contain a
meaningful id; instead, use information from
alignData to identify reads.
Different interfaces to reading alignment files are described in
SolexaPath and SolexaSet.
type="SolexaPrealign"type="SolexaAlign"type="SolexaRealign"
These types parse s_L_TTTT_prealign.txt,
s_L_TTTT_align.txt or s_L_TTTT_realign.txt files
produced by default and eland analyses. From the Solexa
documentation, align corresponds to unfiltered first-pass
alignements, prealign adjusts alignments for error rates
(when available), realign filters alignments to exclude
clusters failing to pass quality criteria.
Because base quality scores are not stored with alignments, the
object returned by readAligned scores all base qualities as
-32.
If parsed, values can be retrieved from
AlignedRead as follows:
sreadalignQualityalignDatapositionstrandreadXStringColumnsalignDatatype="MAQMap", records=-1Lmap
files produced by MAQ. See details in the next section. The
records option determines how many lines are read;
-1L (the default) means that all records are input.type="MAQMapShort", records=-1Ltype="MAQMap"
but for map files made with Maq prior to version 0.7.0. (These files
use a different maximum read length [64 instead of 128], and are hence
incompatible with newer Maq map files.)type="MAQMapview"Parse alignment files created by MAQ's ‘mapiew’ command. Interpretation of columns is based on the description in the MAQ manual, specifically
...each line consists of read name, chromosome, position,
strand, insert size from the outer coordinates of a pair,
paired flag, mapping quality, single-end mapping quality,
alternative mapping quality, number of mismatches of the
best hit, sum of qualities of mismatched bases of the best
hit, number of 0-mismatch hits of the first 24bp, number
of 1-mismatch hits of the first 24bp on the reference,
length of the read, read sequence and its quality.
The read name, read sequence, and quality are read as
XStringSet objects. Chromosome and strand are read as
factors. Position is numeric, while mapping quality is
numeric. These fields are mapped to their corresponding
representation in AlignedRead objects.
Number of mismatches of the best hit, sum of qualities of mismatched
bases of the best hit, number of 0-mismatch hits of the first 24bp,
number of 1-mismatch hits of the first 24bp are represented in the
AlignedRead object as components of alignData.
Remaining fields are currently ignored.
A single R object (e.g., AlignedRead) containing
alignments, sequences and qualities of all files in dirPath
matching pattern. There is no guarantee of order in which files
are read.
Martin Morgan <mtmorgan@fhcrc.org>, Simon Anders <anders@ebi.ac.uk> (MAQ map)
A AlignedRead object.
The MAQ reference manual, http://maq.sourceforge.net/maq-manpage.shtml#5, 3 May, 2008
sp <- SolexaPath(system.file("extdata", package="ShortRead"))
ap <- analysisPath(sp)
## ELAND_EXTENDED
readAligned(ap, "s_2_export.txt", "SolexaExport")
## PhageAlign
readAligned(ap, "s_5_.*_realign.txt", "SolexaRealign")
## MAQ
dirPath <- system.file('extdata', 'maq', package='ShortRead')
list.files(dirPath)
## First line
readLines(list.files(dirPath, full.names=TRUE)[[1]], 1)
countLines(dirPath)
## two files collapse into one
readAligned(dirPath, type="MAQMapview")
## select only chr1-5.fa, '+' strand
filt <- compose(chromosomeFilter("chr[1-5].fa"),
strandFilter("+"))
readAligned(sp, "s_2_export.txt", filter=filt)