MaskCollection-class       package:Biostrings       R Documentation

_M_a_s_k_C_o_l_l_e_c_t_i_o_n _o_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The MaskCollection class is a container for storing a collection
     of masks that can be used to mask regions in a sequence.

_D_e_t_a_i_l_s:

     In the context of the Biostrings package, a mask is a set of
     regions in a sequence that need to be excluded from some
     computation. For example, when calling 'alphabetFrequency' or
     'matchPattern' on a chromosome sequence, you might want to exclude
     some regions like the centromere or the repeat regions. This can
     be achieved by putting one or several masks on the sequence before
     calling 'alphabetFrequency' on it.

     A MaskCollection object is a vector-like object that represents
     such set of masks. Like standard R vectors, it has a "length"
     which is the number of masks contained in it. But unlike standard
     R vectors, it also has a "width" which determines the length of
     the sequences it can be "put on". For example, a MaskCollection
     object of width 20000 can only be put on an XString object of
     20000 letters.

     Each mask in a MaskCollection object 'x' is just a finite set of
     integers that are >= 1 and <= 'width(x)'. When "put on" a
     sequence, these integers indicate the positions of the letters to
     mask. Internally, each mask is represented by a NormalIRanges
     object.

_B_a_s_i_c _a_c_c_e_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a MaskCollection object.


      'length(x)': The number of masks in 'x'.

      'width(x)': The common with of all the masks in 'x'. This
          determines the length of the sequences that 'x' can be "put
          on".

      'active(x)': A logical vector of the same length as 'x' where
          each element indicates whether the corresponding mask is
          active or not.

      'names(x)': 'NULL' or a character vector of the same length as
          'x'.

      'nir_list(x)': A list of the same length as 'x', where each
          element is a NormalIRanges object representing a mask in 'x'.


_C_o_n_s_t_r_u_c_t_o_r:


      'Mask(mask.width, start=NULL, end=NULL, width=NULL)': Return a
          single mask (i.e. a MaskCollection object of length 1) of
          width 'mask.width' (a single integer >= 1) and masking the
          ranges of positions specified by 'start', 'end' and 'width'.
          See the 'IRanges' constructor ('?IRanges') for how 'start',
          'end' and 'width' can be specified. Note that the returned
          mask is active and unnamed.


_O_t_h_e_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a MaskCollection object.


      'isEmpty(x)': Return a logical vector of the same length as 'x',
          indicating, for each mask in 'x', whether it's empty or not.

      'max(x)': The greatest (or last, or rightmost) masked position
          for each mask. This is a numeric vector of the same length as
          'x'.

      'min(x)': The smallest (or first, or leftmost) masked position
          for each mask. This is a numeric vector of the same length as
          'x'.

      'maskedwidth(x)': The number of masked position for each mask.
          This is an integer vector of the same length as 'x' where all
          values are >= 0 and <= 'width(x)'.

      'maskedratio(x)': 'maskedwidth(x) / width(x)'


_S_u_b_s_e_t_t_i_n_g _a_n_d _a_p_p_e_n_d_i_n_g:

     In the code snippets below, 'x' is a MaskCollection object.


      'x[i]': Return a new MaskCollection object made of the selected
          masks. 'i' can be a numeric vector, a logical vector, 'NULL'
          or missing.

      'append(x, values, after=length(x))': Add masks to 'x'.

      'x[[i]]': Extract the i-th mask as a NormalIRanges object.


_O_t_h_e_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a MaskCollection object.


      'narrow(x, start=NA, end=NA, width=NA, use.names=TRUE)': Narrow
          the masks in 'x'.

      'reduce(x)': Return a MaskCollection object of length 1 made of
          the union (or merging, or collapsing) of all the active masks
          in 'x'.

      'gaps(x)': Invert the masks in 'x'.


_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     MaskedXString-class, 'maskMotif', 'alphabetFrequency', 'reverse',
     'matchPattern', NormalIRanges-class

_E_x_a_m_p_l_e_s:

       ## Making a MaskCollection object:
       mask1 <- Mask(mask.width=29, start=c(11, 25, 28), width=c(5, 2, 2))
       mask2 <- Mask(mask.width=29, start=c(3, 10, 27), width=c(5, 8, 1))
       mask3 <- Mask(mask.width=29, start=c(7, 12), width=c(2, 4))
       mymasks <- append(append(mask1, mask2), mask3)
       mymasks
       length(mymasks)
       width(mymasks)
       reduce(mymasks)
       gaps(mymasks)

       ## Putting a MaskCollection object on a sequence:
       x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
       x
       length(x)  # same as width(mymasks)
       nchar(x)   # same as length(x)
       masks(x) <- mymasks
       x
       length(x)  # has not changed
       nchar(x)   # has changed
       alphabetFrequency(x)

       ## Removing the masks:
       masks(x) <- NULL
       x
       alphabetFrequency(x)

       ## Active/inactive masks:
       reduce(mymasks)
       active(mymasks)[2] <- FALSE
       mymasks
       reduce(mymasks)

       ## Other advanced operations:
       mymasks[[2]]
       length(mymasks[[2]])
       mymasks[[2]][-3]
       append(mymasks[-2], gaps(mymasks[2]))
       mymasks2 <- narrow(mymasks, start=8)
       mymasks2
       mymasks2[[2]]

