XString-class           package:Biostrings           R Documentation

_B_S_t_r_i_n_g _o_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The BString class is a general container for storing a big string
     (a long sequence of characters) and for making its manipulation
     easy and efficient.

     The DNAString, RNAString and AAString classes are similar
     containers but with the more biology-oriented purpose of storing a
     DNA sequence (DNAString), an RNA sequence (RNAString), or a
     sequence of amino acids (AAString).

     All those containers derive directly (and with no additional
     slots) from the XString virtual class. They are also said to be
     XString subtypes.

_D_e_t_a_i_l_s:

     The 2 main differences between an XString object and a standard
     character vector are: (1) the data stored in an XString object are
     not copied on object duplication and (2) an XString object can
     only store a single string (see the XStringSet container for an
     efficient way to store a big collection of strings in a single
     object).

     Unlike the DNAString, RNAString and AAString containers that
     accept only a predefined set of letters (the alphabet), a BString
     object can be used for storing any single string based on a
     single-byte character set.

_C_o_n_s_t_r_u_c_t_o_r-_l_i_k_e _f_u_n_c_t_i_o_n_s _a_n_d _g_e_n_e_r_i_c_s:

     In the code snippet below, 'x' can be a single string (character
     vector of length 1) or an XString object.


      'BString(x, start=1, nchar=NA, check=TRUE)': Tries to convert 'x'
          into a BString object by reading 'nchar' letters starting at
          position 'start' in 'x'.


_A_c_c_e_s_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an XString object.


      'alphabet(x)': 'NULL' for a 'BString' object. See the
          corresponding man pages when 'x' is a  DNAString, RNAString
          or AAString object.

      'length(x)' or 'nchar(x)': Get the length of an XString object,
          i.e., its number of letters.


_C_o_e_r_c_i_o_n:

     In the code snippets below, 'x' is an XString object.


      'as.character(x)': Converts 'x' to a character string.

      'toString(x)': Equivalent to 'as.character(x)'.


_S_u_b_s_e_q_u_e_n_c_e _e_x_t_r_a_c_t_i_o_n _a_n_d _s_u_b_s_e_t_t_i_n_g:

     In the code snippets below, 'x' is an XString object.


      'subseq(x, start=NA, end=NA, width=NA)': Extract the subsequence
          from 'x' specified by 'start', 'end' and 'width'. At least
          one of 'start', 'end' and 'width' must be 'NA' and the other
          ones must be single numeric values. If at least two of them
          are 'NA's, then 'start=NA' is interpreted as 'start=1' and
          'end=NA' is interpreted as 'end=length(x)'. A negative value
          for 'start' or 'end' is interpreted relatively to the end of
          'x' e.g. 'start=-1' is equivalent to 'start=length(x)'.
          Finally, if 'width' is not 'NA', then 'start' and 'end'
          cannot be both 'NA's.

          A note about performance: 'subseq' does NOT copy the sequence
          data, hence it's very efficient and is the recommended way to
          extract a subsequence (i.e. a set of consecutive letters)
          from an XString object. For example, extracting a 100Mb
          subsequence from Human chromosome 1 (250Mb) with 'subseq' is
          (almost) instantaneous and has (almost) no memory footprint
          (the cost in time and memory does not depend on the length of
          the original sequence or on the length of the subsequence to
          extract).

      'x[i]': Return a new XString object made of the selected letters
          (subscript 'i' must be an NA-free numeric vector specifying
          the positions of the letters to select). The returned object
          belongs to the same class (i.e. same XString subtype) as 'x'.

          Note that, unlike 'subseq', 'x[i]' does copy the sequence
          data and therefore will be very inefficient for extracting a
          big number of letters (e.g. when 'i' contains millions of
          positions).


_E_q_u_a_l_i_t_y:

     In the code snippets below, 'e1' and 'e2' are XString objects.


      'e1 == e2': 'TRUE' if 'e1' is equal to 'e2'. 'FALSE' otherwise.

          Comparison between two XString objects of different subtypes
          (e.g. a BString object and a DNAString object) is not
          supported with one exception: a DNAString object and an
          RNAString object can be compared (see RNAString-class for
          more details about this).

          Comparison between a BString object and a character string is
          also supported (see examples below).

      'e1 != e2': Equivalent to '!(e1 == e2)'.


_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     'letter', DNAString-class, RNAString-class, AAString-class,
     XStringSet-class, XStringViews-class, 'reverse'

_E_x_a_m_p_l_e_s:

       b <- BString("I am a BString object")
       b
       length(b)

       ## Subsequence extraction
       subseq(b)
       subseq(b, start=3)
       subseq(b, start=-3)
       subseq(b, end=-3)
       subseq(b, end=-3, width=5)

       ## Subsetting
       b2 <- b[length(b):1]       # better done with reverse(b)

       as.character(b2)

       b2 == b                    # FALSE
       b2 == as.character(b2)     # TRUE

       ## b[1:length(b)] is equal but not identical to b!
       b == b[1:length(b)]        # TRUE
       identical(b, 1:length(b))  # FALSE
       ## This is because subsetting an XString object with [ makes a copy
       ## of part or all its sequence data. Hence, for the resulting object,
       ## the internal slot containing the memory address of the sequence
       ## data differs from the original. This is enough for identical() to
       ## see the 2 objects as different.

