XStringSet-class         package:Biostrings         R Documentation

_B_S_t_r_i_n_g_S_e_t, _D_N_A_S_t_r_i_n_g_S_e_t, _R_N_A_S_t_r_i_n_g_S_e_t _a_n_d _A_A_S_t_r_i_n_g_S_e_t _o_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The BStringSet class is a container for storing a set of 'BString'
     objects and for making its manipulation easy and efficient.

     Similarly, the DNAStringSet (or RNAStringSet, or AAStringSet)
     class is a container for storing a set of 'DNAString' (or
     'RNAString', or 'AAString') objects.

     All those containers derive directly (and with no additional
     slots) from the XStringSet virtual class.

_U_s_a_g_e:

       ## Constructors:
       BStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
       DNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
       RNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
       AAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)

       ## Accessor-like methods:
       ## S4 method for signature 'XStringSet':
       length(x)
       ## S4 method for signature 'character':
       width(x)
       ## S4 method for signature 'XStringSet':
       width(x)
       ## S4 method for signature 'XStringSet':
       names(x)
       ## S4 method for signature 'XStringSet':
       nchar(x, type="chars", allowNA=FALSE)

       ## Efficient subsequence extraction:
       ## S4 method for signature 'character':
       subseq(x, start=NA, end=NA, width=NA)
       ## S4 method for signature 'XStringSet':
       subseq(x, start=NA, end=NA, width=NA)

       ## ... and more (see below)

_A_r_g_u_m_e_n_t_s:

       x: Either a character vector (with no NAs), or an XString,
          XStringSet or XStringViews object. 

start,end,width: Either 'NA', a single integer, or an integer vector of
          the same length as 'x' specifying how 'x' should be
          "narrowed" (see '?narrow' for the details). 

use.names: 'TRUE' or 'FALSE'. Should names be preserved? 

type,allowNA: Ignored. 

_D_e_t_a_i_l_s:

     The 'BStringSet', 'DNAStringSet', 'RNAStringSet' and 'AAStringSet'
     functions are constructors that can be used to "naturally" turn
     'x' into an XStringSet object of the desired base type.

     They also allow the user to "narrow" the sequences contained in
     'x' via proper use of the 'start', 'end' and/or 'width' arguments.
     In this context, "narrowing" means dropping a prefix or/and a
     suffix of each sequence in 'x'. The "narrowing" capabilities of
     these constructors can be illustrated by the following property:
     if 'x' is a character vector (with no NAs), or an XStringSet (or
     XStringViews) object, then the 3 following transformations are
     equivalent:

      'BStringSet(x, start=mystart, end=myend, width=mywidth)'

      'subseq(BStringSet(x), start=mystart, end=myend, width=mywidth)'

      'BStringSet(subseq(x, start=mystart, end=myend, width=mywidth))'


     Note that, besides being more convenient, the first form is also
     more efficient on character vectors.

_A_c_c_e_s_s_o_r-_l_i_k_e _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an XStringSet object.


      'length(x)': The number of sequences in 'x'.

      'width(x)': A vector of non-negative integers containing the
          number of letters for each element in 'x'. Note that
          'width(x)' is also defined for a character vector with no NAs
          and is equivalent to 'nchar(x, type="bytes")'.

      'names(x)': 'NULL' or a character vector of the same length as
          'x' containing a short user-provided description or comment
          for each element in 'x'. These are the only data in an
          XStringSet object that can safely be changed by the user. All
          the other data are immutable! As a general recommendation,
          the user should never try to modify an object by accessing
          its slots directly.

      'alphabet(x)': Return 'NULL', 'DNA_ALPHABET', 'RNA_ALPHABET' or
          'AA_ALPHABET' depending on whether 'x' is a BStringSet,
          DNAStringSet, RNAStringSet or AAStringSet object.

      'nchar(x)': The same as 'width(x)'.


_S_u_b_s_e_q_u_e_n_c_e _e_x_t_r_a_c_t_i_o_n _a_n_d _r_e_l_a_t_e_d _t_r_a_n_s_f_o_r_m_a_t_i_o_n_s:

     In the code snippets below, 'x' is a character vector (with no
     NAs), or an XStringSet (or XStringViews) object.


      'subseq(x, start=NA, end=NA, width=NA)': Applies 'subseq' on each
          element in 'x'. See '?subseq' for the details.

          Note that this is similar to what 'substr' does on a
          character vector. However there are some noticeable
          differences: (1) the arguments are 'start' and 'stop' for
          'substr'; (2) the SEW interface (start/end/width) interface
          of 'subseq' is richer (e.g. support for negative start or end
          values); and (3) 'subseq' checks that the specified
          start/end/width values are valid i.e., unlike 'substr', it
          throws an error if they define "out of limits" subsequences
          or subsequences with a negative width.

      'narrow(x, start=NA, end=NA, width=NA, use.names=TRUE)': Same as
          'subseq'. The only differences are: (1) 'narrow' has a
          'use.names' argument; and (2) all the things 'narrow' and
          'subseq' work on (IRanges, XStringSet or XStringViews objects
          for 'narrow', XSequence or XStringSet objects for 'subseq').
          But they both work and do the same thing on an XStringSet
          object.

      'threebands(x, start=NA, end=NA, width=NA)': Like the method for
          IRanges objects, the 'threebands' methods for character
          vectors and XStringSet objects extend the capability of
          'narrow' by returning the 3 set of subsequences (the left,
          middle and right subsequences) associated to the narrowing
          operation. See '?threebands' in the IRanges package for the
          details.

      'subseq(x, start=NA, end=NA, width=NA) <- value': A vectorized
          version of the 'subseq<-' method for XSequence objects. See
          '?`subseq<-`' for the details.


_S_u_b_s_e_t_t_i_n_g _a_n_d _a_p_p_e_n_d_i_n_g:

     In the code snippets below, 'x' and 'values' are XStringSet
     objects, and 'i' should be an index specifying the elements to
     extract.


      'x[i]': Return a new XStringSet object made of the selected
          elements.

      'x[[i]]': Extract the i-th 'XString' object from 'x'.

      'append(x, values, after=length(x))': Add sequences in 'values'
          to 'x'.


_O_r_d_e_r_i_n_g _a_n_d _r_e_l_a_t_e_d _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an XStringSet object.


      'order(x)': Return a permutation which rearranges 'x' into
          ascending or descending order.

      'sort(x)': Sort 'x' into ascending order (equivalent to
          'x[order(x)]').

      'rank(x)': Rank 'x' in ascending order.


_D_u_p_l_i_c_a_t_e_d _a_n_d _u_n_i_q_u_e _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an XStringSet object.


      'duplicated(x)': Return a logical vector whose elements denotes
          duplicates in 'x'.

      'unique(x)': Return an XStringSet containing the unique values in
          'x'.


_S_e_t _o_p_e_r_a_t_i_o_n_s:

     In the code snippets below, 'x' and 'y' are XStringSet objects


      'union(x, y)': Union of 'x' and 'y'.

      'intersect(x, y)': Intersection of 'x' and 'y'.

      'setdiff(x, y)': Asymmetric set difference of 'x' and 'y'.

      'setequal(x, y)': Set equality of 'x' to 'y'.


_I_d_e_n_t_i_c_a_l _v_a_l_u_e _m_a_t_c_h_i_n_g:

     In the code snippets below, 'x' is a character vector, XString, or
     XStringSet object and 'table' is an XStringSet object.


      'x %in% table': Returns a logical vector indicating which
          elements in 'x' match identically with an element in 'table'.

      'match(x, table, nomatch = NA_integer_, incomparables = NULL)':
          Returns an integer vector containing the first positions of
          an identical match in 'table' for the elements in 'x'.


_O_t_h_e_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an XStringSet object.


      'unlist(x)': Turns 'x' into an XString object by combining the
          sequences in 'x' together. Fast equivalent to 'do.call(c,
          as.list(x))'.

      'as.character(x, use.names)': Convert 'x' to a character vector
          of the same length as 'x'. 'use.names' controls whether or
          not 'names(x)' should be used to set the names of the
          returned vector (default is 'TRUE').

      'as.matrix(x, use.names)': Return a character matrix containing
          the "exploded" representation of the strings. This can only
          be used on an XStringSet object with equal-width strings.
          'use.names' controls whether or not 'names(x)' should be used
          to set the row names of the returned matrix (default is
          'TRUE').

      'toString(x)': Equivalent to 'toString(as.character(x))'.


_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     BString-class, DNAString-class, RNAString-class, AAString-class,
     XStringViews-class, 'substr', 'subseq', 'narrow'

_E_x_a_m_p_l_e_s:

       ## ---------------------------------------------------------------------
       ## A. USING THE XStringSet CONSTRUCTORS ON A CHARACTER VECTOR
       ## ---------------------------------------------------------------------
       ## Note that there is no XStringSet() constructor, but an XStringSet
       ## family of constructors: BStringSet(), DNAStringSet(), RNAStringSet(),
       ## etc...
       x0 <- c("#CTC-NACCAGTAT", "#TTGA", "TACCTAGAG")
       width(x0)
       x1 <- BStringSet(x0)
       x1

       ## 3 equivalent ways to obtain the same BStringSet object:
       BStringSet(x0, start=4, end=-3)
       subseq(x1, start=4, end=-3)
       BStringSet(subseq(x0, start=4, end=-3))

       dna0 <- DNAStringSet(x0, start=4, end=-3)
       dna0
       names(dna0)
       names(dna0)[2] <- "seqB"
       dna0

       ## ---------------------------------------------------------------------
       ## B. USING THE XStringSet CONSTRUCTORS ON AN XStringSet OBJECT
       ## ---------------------------------------------------------------------
       library(drosophila2probe)
       probes <- DNAStringSet(drosophila2probe$sequence)
       probes

       RNAStringSet(probes, start=2, end=-5)  # does NOT copy the sequence data!

       ## ---------------------------------------------------------------------
       ## C. USING subseq() ON AN XStringSet OBJECT
       ## ---------------------------------------------------------------------
       subseq(probes, start=2, end=-5)

       subseq(probes, start=13, end=13) <- "N"
       probes

       ## Add/remove a prefix:
       subseq(probes, start=1, end=0) <- "--"
       probes
       subseq(probes, end=2) <- ""
       probes

       ## Do more complicated things:
       subseq(probes, start=4:7, end=7) <- c("YYYY", "YYY", "YY", "Y")
       subseq(probes, start=4, end=6) <- subseq(probes, start=-2:-5)
       probes

       ## ---------------------------------------------------------------------
       ## D. UNLISTING AN XStringSet OBJECT
       ## ---------------------------------------------------------------------
       library(drosophila2probe)
       probes <- DNAStringSet(drosophila2probe$sequence)
       unlist(probes)  

