reverseComplement         package:Biostrings         R Documentation

_S_e_q_u_e_n_c_e _r_e_v_e_r_s_i_n_g _a_n_d _c_o_m_p_l_e_m_e_n_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Use these functions for reversing sequences and/or complementing
     DNA or RNA sequences.

_U_s_a_g_e:

       ## S4 method for signature 'character':
       reverse(x, ...)
       ## S4 method for signature 'XString':
       reverse(x, ...)
       complement(x, ...)
       reverseComplement(x, ...)

_A_r_g_u_m_e_n_t_s:

       x: A character vector, or an XString, XStringSet, XStringViews
          or MaskedXString object for 'reverse'.

          A DNAString, RNAString, DNAStringSet, RNAStringSet,
          XStringViews (with DNAString or RNAString subject),
          MaskedDNAString or MaskedRNAString object for 'complement'
          and 'reverseComplement'. 

     ...: Additional arguments to be passed to or from methods. 

_D_e_t_a_i_l_s:

     Given an XString object 'x', 'reverse(x)' returns an object of the
     same XString base type as 'x' where letters in 'x' have been
     reordered in the reverse order.

     If 'x' is a DNAString or RNAString object, 'complement(x)' returns
     an object where each base in 'x' is "complemented" i.e. A, C, G, T
     in a DNAString object are replaced by T, G, C, A respectively and
     A, C, G, U in a RNAString object are replaced by U, G, C, A
     respectively.

     Letters belonging to the "IUPAC extended genetic alphabet" are
     also replaced by their complement (M <-> K, R <-> Y, S <-> S, V
     <-> B, W <-> W, H <-> D, N <-> N) and the gap ('"-"') and hard
     masking ('"+"') letters are unchanged.

     'reverseComplement(x)' is equivalent to 'reverse(complement(x))'
     but is faster and more memory efficient.

_V_a_l_u_e:

     An object of the same class and length as the original object.

_S_e_e _A_l_s_o:

     DNAString-class, RNAString-class, DNAStringSet-class,
     RNAStringSet-class, XStringViews-class, MaskedXString-class,
     'chartr', 'findPalindromes'

_E_x_a_m_p_l_e_s:

       ## ---------------------------------------------------------------------
       ## A. SOME SIMPLE EXAMPLES
       ## ---------------------------------------------------------------------

       x <- DNAString("ACGT-YN-")
       reverseComplement(x)

       library(drosophila2probe)
       probes <- DNAStringSet(drosophila2probe$sequence)
       probes
       alphabetFrequency(probes, collapse=TRUE)
       rcprobes <- reverseComplement(probes)
       rcprobes
       alphabetFrequency(rcprobes, collapse=TRUE)

       ## ---------------------------------------------------------------------
       ## B. OBTAINING THE MISMATCH PROBES OF A CHIP
       ## ---------------------------------------------------------------------

       pm2mm <- function(probes)
       {
           probes <- DNAStringSet(probes)
           subseq(probes, start=13, end=13) <- complement(subseq(probes, start=13, end=13))
           probes
       }
       mmprobes <- pm2mm(probes)
       mmprobes
       alphabetFrequency(mmprobes, collapse=TRUE)

       ## ---------------------------------------------------------------------
       ## C. SEARCHING THE MINUS STRAND OF A CHROMOSOME
       ## ---------------------------------------------------------------------
       ## Applying reverseComplement() to the pattern before calling
       ## matchPattern() is the recommended way of searching hits on the
       ## minus strand of a chromosome.

       library(BSgenome.Dmelanogaster.UCSC.dm3)
       chrX <- Dmelanogaster$chrX
       pattern <- DNAString("ACCAACNNGGTTG")
       matchPattern(pattern, chrX, fixed=FALSE)  # 3 hits on strand +
       rcpattern <- reverseComplement(pattern)
       rcpattern
       m0 <- matchPattern(rcpattern, chrX, fixed=FALSE)
       m0  # 5 hits on strand -

       ## Applying reverseComplement() to the subject instead of the pattern is not
       ## a good idea for 2 reasons:
       ## (1) Chromosome sequences are generally big and sometimes very big
       ##     so computing the reverse complement of the positive strand will
       ##     take time and memory proportional to its length.
       chrXminus <- reverseComplement(chrX)  # needs to allocate 22M of memory!
       chrXminus
       ## (2) Chromosome locations are generally given relatively to the positive
       ##     strand, even for features located in the negative strand, so after
       ##     doing this:
       m1 <- matchPattern(pattern, chrXminus, fixed=FALSE)
       ##     the start/end of the matches are now relative to the negative strand.
       ##     You need to apply reverseComplement() again on the result if you want
       ##     them to be relative to the positive strand:
       m2 <- reverseComplement(m1)  # allocates 22M of memory, again!
       ##     and finally to apply rev() to sort the matches from left to right
       ##     (5'3' direction) like in m0:
       m3 <- rev(m2) # same as m0, finally!

       ## WARNING: Before you try the example below on human chromosome 1, be aware
       ## that it will require the allocation of about 500Mb of memory!
       if (interactive()) {
         library(BSgenome.Hsapiens.UCSC.hg18)
         chr1 <- Hsapiens$chr1
         matchPattern(pattern, reverseComplement(chr1))  # DON'T DO THIS!
         matchPattern(reverseComplement(pattern), chr1)  # DO THIS INSTEAD
       }

