BStringAlign-class        package:Biostrings        R Documentation

_T_h_e _B_S_t_r_i_n_g_A_l_i_g_n _c_l_a_s_s

_D_e_s_c_r_i_p_t_i_o_n:

     The 'BStringAlign class' is a container for storing an alignment
     between 2 'BString' (or derived) objects.

_D_e_t_a_i_l_s:

     Before we define the notion of alignment, we introduce the notion
     of "filled-with-gaps supersequence". A "filled-with-gaps
     supersequence" of a string s1 is a string S1 that is obtained by
     inserting 0 or any number of gaps in s1. For example L-A-ND is a
     "filled-with-gaps supersequence" of LAND. An alignment between 2
     strings s1 and s2 is made of 2 strings align1 and align2 that are
     "filled-with-gaps supersequences" of s1 and s2, and that have the
     same length. Note that this common length must be greater or equal
     to the lengths of s1 and s2: nchar(align1) = nchar(align2) >=
     max(nchar(s1), nchar(s2))

     For example, this is an alignment between LAND and LEAVES:


         L-A--ND
         LEAVES-

     An alignment can be seen as a compact representation of one set of
     basic operations that transforms s1 into s2. There are 3 different
     kinds of basic operations: "insertions" (gaps in align1),
     "deletions" (gaps in align2),  "replacements". The above
     alignement represents the following basic operations:


         insert E at pos 2
         insert V at pos 4
         insert E at pos 5
         replace by S at pos 6 (N is replaced by S)
         delete at pos 7 (D is deleted)

     Note that "insert X at pos i" means that all letters at a position
     >= i are moved 1 place to the right before X is actually inserted.

     There are many possible alignments between 2 given strings s1 and
     s2 and a common problem is to find the one (or those ones) with
     the highest score i.e. with the lower total cost in terms of basic
     operations.

_A_c_c_e_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a 'BStringAlign' object.

      'align1(x)' and 'align2(x)': The "filled-with-gaps
          supersequences" of the original strings to align. Note that
          'align1(x)' and 'align2(x)' are 'BString' (or derived)
          objects of the same class and of the same length.

      'score(x)': The score of the alignment (integer).

      'length(x)' or 'nchar(x)': The length of the alignment i.e. the
          common length of 'align1(x)' and 'align2(x)'.

      'alphabet(x)': Equivalent to 'alphabet(align1(x))' (or
          'alphabet(align2(x))').

_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     'needwunsQS', BString-class, DNAString-class, RNAString-class,
     AAString-class

_E_x_a_m_p_l_e_s:

       s1 <- AAString("LAND")
       s2 <- AAString("LEAVES")
       ## With the needwunsQS function, the cost of an insertion or deletion
       ## is controlled by the gappen (gap penalty) arg, the cost of a replacement
       ## is controlled by the "substitution scoring matrix" passed thru the substmat
       ## arg
       nw1 <- needwunsQS(s1, s2, substmat="BLOSUM50", gappen=1)
       nw1
       length(nw1)
       nw0 <- needwunsQS(s1, s2, substmat="BLOSUM50", gappen=0)
       nw0
       length(nw0)
       ## Low gap penalties tend to produce longer alignments!

