reverseComplement         package:Biostrings         R Documentation

_S_e_q_u_e_n_c_e _r_e_v_e_r_s_i_n_g _a_n_d _c_o_m_p_l_e_m_e_n_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Use these functions for reversing a sequence and/or complementing
     a DNA sequence.

_U_s_a_g_e:

       reverse(x, ...)
       complement(x, ...)
       reverseComplement(x, ...)

_A_r_g_u_m_e_n_t_s:

       x: An IRanges, NormalIRanges, MaskCollection, XString,
          XStringSet, XStringViews or MaskedXString object for
          'reverse'.

          A DNAString, RNAString, DNAStringSet, RNAStringSet,
          XStringViews (with DNAString or RNAString subject),
          MaskedDNAString or MaskedRNAString object for 'complement'
          and 'reverseComplement'. 

     ...: Additional arguments to be passed to or from methods. 

_D_e_t_a_i_l_s:

     Given an XString object 'x', 'reverse(x)' returns an object of the
     same XString subtype as 'x' where letters in 'x' have been
     reordered in the reverse order.

     If 'x' is a DNAString or RNAString object, 'complement(x)' returns
     an object where each base in 'x' is "complemented" i.e. A, C, G, T
     in a DNAString object are replaced by T, G, C, A respectively and
     A, C, G, U in a RNAString object are replaced by U, G, C, A
     respectively.

     Letters belonging to the "IUPAC extended genetic alphabet" are
     also replaced by their complement (M <-> K, R <-> Y, S <-> S, V
     <-> B, W <-> W, H <-> D, N <-> N) and the gap ('"-"') and hard
     masking ('"+"') letters are unchanged.

     'reverseComplement(x)' is equivalent to 'reverse(complement(x))'
     but is faster and more memory efficient.

_V_a_l_u_e:

     An object of the same class and length as the original object.

_S_e_e _A_l_s_o:

     IRanges-class, NormalIRanges-class, MaskCollection-class,
     DNAString-class, RNAString-class, DNAStringSet-class,
     RNAStringSet-class, XStringViews-class, MaskedXString-class,
     'strrev', 'chartr', 'findPalindromes'

_E_x_a_m_p_l_e_s:

       ## ---------------------------------------------------------------------
       ## A. SIMPLE EXAMPLES
       ## ---------------------------------------------------------------------

       x <- DNAString("ACGT-YN-")
       reverseComplement(x)

       library(drosophila2probe)
       x <- DNAStringSet(drosophila2probe$sequence)
       x
       alphabetFrequency(x, collapse=TRUE)
       rcx <- reverseComplement(x)
       rcx
       alphabetFrequency(rcx, collapse=TRUE)

       ## ---------------------------------------------------------------------
       ## B. SEARCHING THE REVERSE STRAND OF A CHROMOSOME
       ## ---------------------------------------------------------------------
       ## Applying reverseComplement() to the pattern before calling
       ## matchPattern() is the recommended way to search hits on the reverse
       ## strand of a chromosome.

       library(BSgenome.Dmelanogaster.UCSC.dm3)
       chrX <- Dmelanogaster$chrX
       chrX
       alphabetFrequency(chrX)  # 90100 N's

       ## Activate "assembly gaps" and "RepeatMasker" masks:
       active(masks(chrX))[1:2] <- TRUE
       chrX
       alphabetFrequency(chrX)  # no more N's

       pattern <- DNAString("ACCAACNNGGTTG")
       matchPattern(pattern, chrX, fixed=FALSE)  # 3 hits on strand +
       rcpattern <- reverseComplement(pattern)
       rcpattern
       m0 <- matchPattern(rcpattern, chrX, fixed=FALSE) # 5 hits on strand -

       ## Applying reverseComplement() to the subject instead of the pattern is not
       ## a good idea for 2 reasons:
       ## (1) Chromosome sequences are generally big and sometimes very big
       ##     so computing the reverse complement of the positive strand will
       ##     take time and memory proportional to its length.
       chrXminus <- reverseComplement(chrX)  # needs to allocate 22M of memory!
       chrXminus
       ## (2) Chromosome locations are generally given relatively to the positive
       ##     strand, even for features located in the negative strand, so after
       ##     doing this:
       m1 <- matchPattern(pattern, chrXminus, fixed=FALSE)
       ##     the start/end of the matches are now relative to the negative strand.
       ##     You need to apply reverseComplement() again on the result if you want
       ##     them to be relative to the positive strand:
       m2 <- reverseComplement(m1)  # allocates 22M of memory, again!
       ##     and finally to apply rev() to sort the matches from left to right
       ##     (5'3' direction) like in m0:
       m3 <- rev(m2) # same as m0, finally!

       ## WARNING: Before you try the example below on human chromosome 1, be aware
       ## that it will require the allocation of about 500Mb of memory!
       if (interactive()) {
         library(BSgenome.Hsapiens.UCSC.hg18)
         chr1 <- Hsapiens$chr1
         matchPattern(pattern, reverseComplement(chr1))  # DON'T DO THIS!
         matchPattern(reverseComplement(pattern), chr1)  # DO THIS INSTEAD
       }

