Globals Variables           package:GeneR           R Documentation

_G_l_o_b_a_l_s _v_a_r_i_a_b_l_e_s _o_n _G_e_n_e _s_e_q_u_e_n_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     There are two ways to store sequences in GeneR:

        *  In a C adapted class (buffers) that stores in addition some
           globals variables, like working strand, size of original
           sequence and so on.

           It is usefull when, for example, we have to work on a subset
           of a whole chromosome (i.e. a gene). In this case it will be
           worthwhile to load only the gene in R. Nevertheless, it will
           remain easy to associate positions on chromosome and
           positions on gene ...

        *  As a character string, the more logical way to store short
           sequences like "ATGTCGTG".  It concerns all functions like
           "strxxx" (strComp, strReadFasta etc.).


_D_e_t_a_i_l_s:

     When GeneR load a subset of a larger sequence stored in a bank
     file, it will store the following informations in the C adapted
     class (buffers, by default 100 buffers than can be extended if
     necessary):

        *  subsequence (i.e. the succession of A,T,G,C).

        *  postions of the extremities of the subsequence  in the
           master sequence

        *  size of the whole sequence in the bank file

        *  name of the sequence

     For specific purposes as renaming a sequence, all these variables
     can be viewed and carefully changed at any time  (here functions
     'getAccn' and 'setAccn').

     Several sequences can be stored simultaneously and called by their
     buffer number.

     Strand is another global variable which can be set and viewed 
     (functions 'getStrand' and 'setStrand'). It is used as input
     parameter in many functions to analyze complementary strand. It
     was designed to avoid doing explicitly the complement of the
     loaded strand then to store it in a buffer with, as consequence,
     loss of the informations linked to the master sequence.

     We have defined 3 types of addresses on a subsequence extracted
     from a master sequence:


        *  Absolute addresses i.e. addresses on the master sequence,
           from the 5' end of the input strand refered as forward
           (noted A)

        *  Real addresses, i.e.  addresses on the master sequence, from
           the 5' end of one of strands (noted R)

        *  Relative addresses, i.e.  addresses on working subsequence,
           from the 5' end of one of strands (noted T).

     Let's show an example, if we read sequence from 11 to 20 from a
     gene of size 40:


       Strand 0  (Forward strand)
       1         11       20                  40  Absolute (A) 
       1         11       20                  40  Real (R) 
                 1        10                      Relative (T)
       xxxxxxxxxxATGTGTCGTAxxxxxxxxxxxxxxxxxxxx
                 10       1                       Relative (T)
       40        30       21                  1   Real (R)
       1         11       20                  40  Absolute (A)
       Strand 1  (Reverse strand)

     Obviously, when an entire sequence is stored, real and relative
     addresses will be the same.

     Although all functions using positions need and return absolute
     addresses, 6 functions allow to convert R, A, T into any other
     type (functions 'RtoA, RtoT, AtoR, AtoT, TtoR, TtoA').

     A global variable 'strand' is used to convert positions (see 
     'setStrand'  'getStrand').

_S_e_e _A_l_s_o:

     'AtoT', 'AtoR', 'RtoA', 'RtoT', 'TtoA', 'TtoR', 'setStrand',
     'getStrand', 'getParam', 'setParam', 'getAccn', 'setAccn'

_E_x_a_m_p_l_e_s:

     ## Make a dummy sequence
     s <- "xxxxxxxxxxATGTGTCGTAxxxxxxxxxxxxxxxxxxxx"
     placeString(s)
     writeFasta(file="toto.fa")

     indexFasta("toto.fa")
     readFasta("toto.fa",from=11,to=20)

     getParam()
     ## $begin 
     ## [1] 11
     ## $size
     ## [1] 40
     ## $strand
     ## [1] 0
     ## [...]

     ## With strand = 0 
     TtoA(c(1,10))
     ##[1] 10 19

     TtoR(c(1,10))
     ##[1] 10 19

