readAligned            package:ShortRead            R Documentation

_R_e_a_d _a_l_i_g_n_e_d _r_e_a_d_s _a_n_d _t_h_e_i_r _q_u_a_l_i_t_y _s_c_o_r_e_s _i_n_t_o _R _r_e_p_r_e_s_e_n_t_a_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     'readAligned' reads all aligned read files in a directory
     'dirPath' whose file name matches 'pattern', returning a compact
     internal representation of the alignments, sequences, and quality
     scores in the files. Methods read all files into a single R
     object; a typical use is to restrict input to a single aligned
     read file.

_U_s_a_g_e:

     readAligned(dirPath, pattern=character(0), ...)

_A_r_g_u_m_e_n_t_s:

 dirPath: A character vector (or other object; see methods defined on
          this generic) giving the directory path (relative or
          absolute) of aligned read files to be input.

 pattern: The ('grep'-style) pattern describing file names to be read.
          The default ('character(0)') results in (attempted) input of
          all files in the directory.

     ...: Additional arguments, used by methods. Most methods implement
          'filter=srFilter()', allowing objects of 'SRFilter' to
          selectively returns aligned reads.

_D_e_t_a_i_l_s:

     There is no standard aligned read file format; methods parse
     particular file types.

     The 'readAligned,character-method' interprets file types based on
     an additional 'type' argument. Supported types are:


   '_t_y_p_e="_S_o_l_e_x_a_E_x_p_o_r_t"' This type parses '.*_export.txt' files
        following the documentation in the Solexa Genome Alignment
        software manual, version 0.3.0. These files consist of the
        following columns; consult Solexa documentation for precise
        descriptions. If parsed, values can be retrieved from
        'AlignedRead' as follows:

        _M_a_c_h_i_n_e Ignored

        _R_u_n _n_u_m_b_e_r stored in 'alignData'

        _L_a_n_e stored in 'alignData'

        _T_i_l_e stored in 'alignData'

        _X stored in 'alignData'

        _Y stored in 'alignData'

        _I_n_d_e_x _s_t_r_i_n_g Ignored

        _R_e_a_d _n_u_m_b_e_r Ignored

        _R_e_a_d 'sread'

        _Q_u_a_l_i_t_y 'quality'

        _M_a_t_c_h _c_h_r_o_m_o_s_o_m_e 'chromosome'

        _M_a_t_c_h _c_o_n_t_i_g Ignored

        _M_a_t_c_h _p_o_s_i_t_i_o_n 'position'

        _M_a_t_c_h _s_t_r_a_n_d 'strand'

        _M_a_t_c_h _d_e_s_c_r_i_p_t_i_o_n Ignored

        _S_i_n_g_l_e-_r_e_a_d _a_l_i_g_n_m_e_n_t _s_c_o_r_e 'alignQuality'

        _P_a_i_r_e_d-_r_e_a_d _a_l_i_g_n_m_e_n_t _s_c_o_r_e Ignored

        _P_a_r_t_n_e_r _c_h_r_o_m_o_s_o_m_e Ignored

        _P_a_r_t_n_e_r _c_o_n_t_i_g Ignored

        _P_a_r_t_n_e_r _o_f_f_s_e_t Ignored

        _P_a_r_t_n_e_r _s_t_r_a_n_d Ignored

        _F_i_l_t_e_r_i_n_g 'alignData'

        Paired read columns are not interpreted.  The resulting
        'AlignedRead' object does _not_ contain a meaningful 'id';
        instead, use information from 'alignData' to identify reads.

        Different interfaces to reading alignment files are described
        in 'SolexaPath' and 'SolexaSet'.


   '_t_y_p_e="_S_o_l_e_x_a_P_r_e_a_l_i_g_n"' See SolexaRealign

   '_t_y_p_e="_S_o_l_e_x_a_A_l_i_g_n"' See SolexaRealign

   '_t_y_p_e="_S_o_l_e_x_a_R_e_a_l_i_g_n"' These types parse 's_L_TTTT_prealign.txt',
        's_L_TTTT_align.txt' or 's_L_TTTT_realign.txt' files produced
        by default and eland analyses. From the Solexa documentation,
        'align' corresponds to unfiltered first-pass alignements,
        'prealign' adjusts alignments for error rates (when available),
        'realign' filters alignments to exclude clusters failing to
        pass quality criteria.

        Because base quality scores are not stored with alignments, the
        object returned by 'readAligned' scores all base qualities as
        '-32'.

        If parsed, values can be retrieved from 'AlignedRead' as
        follows:

        _S_e_q_u_e_n_c_e stored in 'sread'

        _B_e_s_t _s_c_o_r_e stored in 'alignQuality'

        _N_u_m_b_e_r _o_f _h_i_t_s stored in 'alignData'

        _T_a_r_g_e_t _p_o_s_i_t_i_o_n stored in 'position'

        _S_t_r_a_n_d stored in 'strand'

        _T_a_r_g_e_t _s_e_q_u_e_n_c_e Ignored; parse using 'readXStringColumns'

        _N_e_x_t _b_e_s_t _s_c_o_r_e stored in 'alignData'


   '_t_y_p_e="_M_A_Q_M_a_p", _r_e_c_o_r_d_s=-_1_L' Parse binary 'map' files produced by
        MAQ. See details in the next section. The 'records' option
        determines how many lines are read; '-1L' (the default) means
        that all records are input.

   '_t_y_p_e="_M_A_Q_M_a_p_S_h_o_r_t", _r_e_c_o_r_d_s=-_1_L' The same as 'type="MAQMap"' but
        for map files made with Maq prior to version 0.7.0. (These
        files use a different maximum read length [64 instead of 128],
        and are hence incompatible with newer Maq map files.)

   '_t_y_p_e="_M_A_Q_M_a_p_v_i_e_w"' Parse alignment files created by MAQ's mapiew
        command. Interpretation of columns is based on the description
        in the MAQ manual, specifically



                ...each line consists of read name, chromosome,
        position,
                strand, insert size from the outer coordinates of a
        pair,
                paired flag, mapping quality, single-end mapping
        quality,
                alternative mapping quality, number of mismatches of
        the
                best hit, sum of qualities of mismatched bases of the
        best
                hit, number of 0-mismatch hits of the first 24bp,
        number
                of 1-mismatch hits of the first 24bp on the reference,
                length of the read, read sequence and its quality.

        The read name, read sequence, and quality are read as
        'XStringSet' objects. Chromosome and strand are read as
        'factor's.  Position is 'numeric', while mapping quality is
        'numeric'. These fields are mapped to their corresponding
        representation in 'AlignedRead' objects.

        Number of mismatches of the best hit, sum of qualities of
        mismatched bases of the best hit, number of 0-mismatch hits of
        the first 24bp, number of 1-mismatch hits of the first 24bp are
        represented in the 'AlignedRead' object as components of
        'alignData'.

        Remaining fields are currently ignored.


_V_a_l_u_e:

     A single R object (e.g., 'AlignedRead') containing alignments,
     sequences and qualities of all files in 'dirPath' matching
     'pattern'. There is no guarantee of order in which files are read.

_A_u_t_h_o_r(_s):

     Martin Morgan <mtmorgan@fhcrc.org>, Simon Anders
     <anders@ebi.ac.uk> (MAQ map)

_S_e_e _A_l_s_o:

     A 'AlignedRead' object.

     The MAQ reference manual, <URL:
     http://maq.sourceforge.net/maq-manpage.shtml#5>, 3 May, 2008

_E_x_a_m_p_l_e_s:

     sp <- SolexaPath(system.file("extdata", package="ShortRead"))
     ap <- analysisPath(sp)
     ## ELAND_EXTENDED
     readAligned(ap, "s_2_export.txt", "SolexaExport")
     ## PhageAlign
     readAligned(ap, "s_5_.*_realign.txt", "SolexaRealign")

     ## MAQ
     dirPath <- system.file('extdata', 'maq', package='ShortRead')
     list.files(dirPath)
     ## First line
     readLines(list.files(dirPath, full.names=TRUE)[[1]], 1)
     countLines(dirPath)
     ## two files collapse into one
     readAligned(dirPath, type="MAQMapview")

     ## select only chr1-5.fa, '+' strand
     filt <- compose(chromosomeFilter("chr[1-5].fa"),
                     strandFilter("+"))
     readAligned(sp, "s_2_export.txt", filter=filt)

