diseq              package:GeneticsBase              R Documentation

_E_s_t_i_m_a_t_e _o_r _C_o_m_p_u_t_e _C_o_n_f_i_d_e_n_c_e _I_n_t_e_r_v_a_l _f_o_r _t_h_e _S_i_n_g_l_e-_M_a_r_k_e_r _D_i_s_e_q_u_i_l_i_b_r_i_u_m

_D_e_s_c_r_i_p_t_i_o_n:

     Estimate or compute confidence interval for single-marker
     disequilibrium.

_U_s_a_g_e:

     diseq.ci(object, marker, R = 1000, conf = 0.95, correct = TRUE, na.rm =
     TRUE, ...)
     diseq.inner(object, marker, ...)

_A_r_g_u_m_e_n_t_s:

  object: geneSet object

  marker: marker names 

       R: Number of bootstrap iterations to use when computing the
          confidence interval. Defaults to 1000.

    conf: Confidence level to use when computing the confidence level
          for D-hat.  Defaults to 0.95, should be in (0,1).

 correct: See details.

   na.rm: logical. Should missing values be removed?

     ...: optional additional parameters passed

_D_e_t_a_i_l_s:

     For a single-gene marker, 'diseq' computes the Hardy-Weinberg
     (dis)equilibrium statistic D, D', r (the correlation coefficient),
     and r^2 for each pair of allele values, as well as an overall
     summary value for each measure across all alleles.  'print.diseq'
     displays the contents of a 'diseq' object. 'diseq.ci' computes a
     bootstrap confidence interval for this estimate.

     For consistency, I have applied the standard definitions for D,
     D', and r from the Linkage Disequilibrium case, replacing all
     marker  probabilities with the appropriate allele probabilities.

     Thus, for each allele pair,

   _D is defined as the half of the raw difference in frequency between
        the observed number of heterozygotes and the expected number:

               D = 1/2 * ( p(ij) + p(ji) ) - p(i)*p(j)


   _D' rescales D to span the range [-1,1] 

                            D' = D / Dmax

        where, if D > 0:

              Dmax = min(p(i)p(j), p(j)p(i)) =  p(i)p(j)

        or if D < 0:

        Dmax = min( p(i) * (1 - p(j)), p(j)( 1 - (1-p(i) ) ) )


   _r is the correlation coefficient between two alleles, and can be
        computed by

            r = -D / sqrt( p(i)*(1-p(i)) * p(j)*(1-p(j)) )



     where

   - p(i) defined as the observed probability of allele 'i', 

   - p(j) defined as the observed probability of allele 'j', and 

   - p(ij) defined as the observed probability of the allele pair 'ij'. 

     When there are more than two alleles, the summary values for these
     statistics are obtained by computing a weighted average of the
     absolute value of each allele pair, where the weight is determined
     by the expected frequency. For example:


                   D.overall = sum |D(ij)| * p(ij)


     Bootstrapping is used to generate confidence interval in order to
     avoid reliance on parametric assumptions, which will not hold for
     alleles with low frequencies (e.g. D' following a a Chi-square 
     distribution).  

     See the function 'HWE' from "genetics" package for testing
     Hardy-Weinberg Equilibrium, D=0.

_A_u_t_h_o_r(_s):

     Gregory R. Warnes warnes@bst.rochester.edu and Nitin Jain
     nitin.jain@pfizer.com

