injectHardMask          package:Biostrings          R Documentation

_I_n_j_e_c_t_i_n_g _a _h_a_r_d _m_a_s_k _i_n _a _s_e_q_u_e_n_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     'injectHardMask' allows the user to "fill" the masked regions of a
     sequence with an arbitrary letter (typically the '"+"' letter).

_U_s_a_g_e:

       injectHardMask(x, letter="+")

_A_r_g_u_m_e_n_t_s:

       x: A MaskedXString or XStringViews object. 

  letter: A single letter. 

_D_e_t_a_i_l_s:

     The name of the 'injectHardMask' function was chosen because of
     the primary use that it is intended for: converting a pile of
     active "soft masks" into a "hard mask". Here the pile of active
     "soft masks" refers to the active masks that have been put on top
     of a sequence. In Biostrings, the original sequence and the masks
     defined on top of it are bundled together in one of the dedicated
     containers for this: the MaskedBString, MaskedDNAString,
     MaskedRNAString and MaskedAAString containers (this is the
     MaskedXString family of containers). The original sequence is
     always stored unmodified in a MaskedXString object so no
     information is lost. This allows the user to activate/deactivate
     masks without having to worry about losing the letters that are in
     the regions that are masked/unmasked. Also this allows better
     memory management since the original sequence never needs to be
     copied, even when the set of active/inactive masks changes.

     However, there are situations where the user might want to
     _really_ get rid of the letters that are in some particular
     regions by replacing them with a junk letter (e.g. '"+"') that is
     guaranteed to not interfer with the analysis that s/he is
     currently doing. For example, it's very likely that a set of
     motifs or short reads will not contain the '"+"' letter (this
     could easily be checked) so they will never hit the regions filled
     with '"+"'. In a way, it's like the regions filled with '"+"' were
     masked but we call this kind of masking "hard masking".

     Some important differences between "soft" and "hard" masking:

      'injectHardMask' creates a (modified) copy of the original
          sequence. Using "soft masking" does not.

      A function that is "mask aware" like 'alphabetFrequency' or
          'matchPattern' will really skip the masked regions when "soft
          masking" is used i.e. they will not walk thru the regions
          that are under active masks. This might lead to some speed
          improvements when a high percentage of the original sequence
          is masked. With "hard masking", the entire sequence is walked
          thru.

      Matches cannot span over masked regions with "soft masking". With
          "hard masking" they can.


_V_a_l_u_e:

     An XString object of the same length as the orignal object 'x' if
     'x' is a MaskedXString object, or of the same length as
     'subject(x)' if it's an XStringViews object.

_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     'maskMotif', MaskedXString-class, 'replaceLetterAt', 'chartr',
     XString, XStringViews-class

_E_x_a_m_p_l_e_s:

       ## ---------------------------------------------------------------------
       ## A. WITH AN XStringViews OBJECT
       ## ---------------------------------------------------------------------
       v2 <- Views("abCDefgHIJK", start=c(8, 3), end=c(14, 4))
       injectHardMask(v2)
       injectHardMask(v2, letter="=")

       ## ---------------------------------------------------------------------
       ## B. WITH A MaskedXString OBJECT
       ## ---------------------------------------------------------------------
       mask0 <- Mask(mask.width=29, start=c(3, 10, 25), width=c(6, 8, 5))
       x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
       masks(x) <- mask0
       x
       subject <- injectHardMask(x)

       ## Matches can span over masked regions with "hard masking":
       matchPattern("ACggggggA", subject, max.mismatch=6)
       ## but not with "soft masking":
       matchPattern("ACggggggA", x, max.mismatch=6)

