outliers             package:factDesign             R Documentation

_D_e_t_e_c_t _s_i_n_g_l_e _o_u_t_l_i_e_r_s _i_n _e_x_p_e_r_i_m_e_n_t_a_l _d_e_s_i_g_n_s _w_i_t_h _o_n_l_y _t_w_o _r_e_p_l_i_c_a_t_e_s _p_e_r _t_r_e_a_t_m_e_n_t _c_o_n_d_i_t_i_o_n.

_D_e_s_c_r_i_p_t_i_o_n:

     These function detect pairs of observations with unexpectedly
     large differences compared to the rest of the data and determine
     if one of the pair is a single outlier using median absolute
     deviation criteria.

_U_s_a_g_e:

     outlierPair(x, INDEX, p = 0.05, na.rm = TRUE)
     madOutPair(x, whichPair, c = 4)

_A_r_g_u_m_e_n_t_s:

       x: A vector of observations. 

   INDEX: A list of factors, each the same length as x, used to
          indicate the replicate observations. 

       p: The significance level at which to perform the test. 

   na.rm: If TRUE, will remove missing values.

whichPair: A result of outlierPair, recording which pair has largest
          difference between replicate observations. 

       c: The number of median absolute deviations to be used as a
          cutoff for determining single outliers.  

_D_e_t_a_i_l_s:

     This outlier detection method is useful for small factorial
     designs in which the usual residuals from a linear model would
     have a large number of linear dependencies compared to the actual
     number of residuals.  The function first calculates n difference
     between 2n replicates (call these pure residuals), and then
     constructs an F-statistic: f=(large squared p.r.)/((sum of
     remaining squared p.r.'s)/(n-1)).  An p-value (adjusted for taking
     the largest of the p.r.'s) is calculated by n*Pr(F(1,n-1)>f). If
     f>=n-1, this p-value is exact, otherwise it is an upper bound.

     Once pairs with significantly large differences are identified
     using outlierPair, madOutPair is applied.  If only one of the
     tagged replicates falls outside the range of
     (med(x)-c*mad(x),med(x)+c*mad(x)), the observation is designated
     the single outlier.

_V_a_l_u_e:

     For 'outlierPair':

    test: Returns TRUE if an outlier pair is detected at the specified
          level of significance p.

    pval: The actual value of n*Pr(F(1,n-1)>f).

whichPair: The index of the pair of observations with the largest
          difference.


     For 'madOutPair':

     The index of the single outlier observation, or "NA" if no single
     outliers are detected.

_A_u_t_h_o_r(_s):

     Denise Scholtens

_R_e_f_e_r_e_n_c_e_s:

     Scholtens et al.  Analyzing Factorial Designed Microarray
     Experiments.   Journal of Multivariate Analysis. 
     2004;90(1):19-43.

_S_e_e _A_l_s_o:

     'madOutPair'

_E_x_a_m_p_l_e_s:

     data(estrogen)

     op1 <- outlierPair(exprs(estrogen)["728_at",],INDEX=pData(estrogen),p=.05)
     print(op1)
     madOutPair(exprs(estrogen)["728_at",],op1[[3]])

     op2 <- outlierPair(exprs(estrogen)["33379_at",],INDEX=pData(estrogen),p=.05)
     print(op2)
     madOutPair(exprs(estrogen)["33379_at",],op2[[3]])

