trimLRPatterns          package:Biostrings          R Documentation

_T_r_i_m _F_l_a_n_k_i_n_g _P_a_t_t_e_r_n_s _f_r_o_m _S_e_q_u_e_n_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     The 'trimLRPatterns' function trims left and/or right flanking
     patterns from sequences.

_U_s_a_g_e:

       trimLRPatterns(Lpattern = "", Rpattern = "", subject,
                      max.Lmismatch = 0, max.Rmismatch = 0,
                      with.Lindels = FALSE, with.Rindels = FALSE,
                      Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)

_A_r_g_u_m_e_n_t_s:

Lpattern: The left part of the pattern. 

Rpattern: The right part of the pattern. 

 subject: An XString or XStringSet object containing the target
          sequence(s). 

max.Lmismatch: Either an integer vector of length 'nLp =
          nchar(Lpattern)' whose elements 'max.Lmismatch[i]' represent
          the maximum number of acceptable mismatching letters when
          aligning 'substring(Lpattern, nLp - i + 1, nLp)' with
          'substring(subject, 1, i)' or a single numeric value in '(0,
          1)' that represents a constant maximum mismatch rate for each
          of the 'nL' alignments. Negative numbers in integer vector
          inputs are used to prevent trimming at the i-th location. If
          an integer vector input has 'length(max.Lmismatch) < nLp',
          then 'max.Lmismatch' will be augmented with enough -1's at
          the beginning of the vector to bring it up to length 'nLp'.

          If non-zero, an inexact matching algorithm is used (see the
          'matchPattern' function for more information). 

max.Rmismatch: Either an integer vector of length 'nRp =
          nchar(Rpattern)' whose elements 'max.Rmismatch[i]' represent
          the maximum number of acceptable mismatching letters when
          aligning 'substring(Rpattern, nRp - i + 1, nRp)' with
          'substring(subject, 1, i)' or a single numeric value in '(0,
          1)' that represents a constant maximum mismatch rate for each
          of the 'nR' alignments. Negative numbers in integer vector
          inputs are used to prevent trimming at the i-th location. If
          an integer vector input has 'length(max.Rmismatch) < nRp',
          then 'max.Rmismatch' will be augmented with enough -1's at
          the beginning of the vector to bring it up to length 'nRp'.

          If non-zero, an inexact matching algorithm is used (see the
          'matchPattern' function for more information). 

with.Lindels: If 'TRUE' then indels are allowed in the left part of the
          pattern. In that case 'max.Lmismatch' is interpreted as the
          maximum "edit distance" allowed in the left part of the
          pattern.

          See the 'with.indels' argument of the 'matchPattern' function
          for more information. 

with.Rindels: Same as 'with.Lindels' but for the right part of the
          pattern. 

  Lfixed: Only with a DNAString or RNAString subject can a 'Lfixed'
          value other than the default ('TRUE') be used.

          With 'Lfixed=FALSE', ambiguities (i.e. letters from the IUPAC
          Extended Genetic Alphabet (see 'IUPAC_CODE_MAP') that are not
          from the base alphabet) in the left pattern _and_ in the
          subject are interpreted as wildcards i.e. they match any
          letter that they stand for.

          See the 'fixed' argument of the 'matchPattern' function for
          more information. 

  Rfixed: Same as 'Lfixed' but for the right part of the pattern. 

  ranges: If 'TRUE', then return the ranges to use to trim 'subject'.
          If 'FALSE', then returned the trimmed 'subject'. 

_V_a_l_u_e:

     A new XString or XStringSet object with the flanking patterns
     within the specified edit distances removed.

_A_u_t_h_o_r(_s):

     P. Aboyoun

_S_e_e _A_l_s_o:

     'matchPattern', 'matchLRPatterns', match-utils, XString-class,
     XStringSet-class

_E_x_a_m_p_l_e_s:

       Lpattern <- "TTCTGCTTG"
       Rpattern <- "GATCGGAAG"
       subject <- DNAString("TTCTGCTTGACGTGATCGGA")
       subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", "TTCTGCTTGGATCGGAAG"))

       ## Only allow for perfect matches on the flanks
       trimLRPatterns(Lpattern = Lpattern, subject = subject)
       trimLRPatterns(Rpattern = Rpattern, subject = subject)
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet)

       ## Allow for perfect matches on the flanking overlaps
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                      max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9))

       ## Allow for mismatches on the flanks
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                      max.Lmismatch = 0.2, max.Rmismatch = 0.2)
       maxMismatches <- as.integer(0.2 * 1:9)
       maxMismatches
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                      max.Lmismatch = maxMismatches, max.Rmismatch = maxMismatches)

       ## Produce ranges that can be an input into other functions
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                      max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9),
                      ranges = TRUE)
       trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                      max.Lmismatch = 0.2, max.Rmismatch = 0.2, ranges = TRUE)

