dupcor                 package:limma                 R Documentation

_C_o_r_r_e_l_a_t_i_o_n _B_e_t_w_e_e_n _D_u_p_l_i_c_a_t_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Estimate the correlation between duplicate spots (regularly spaced
     replicate spots on the same array) or between technical replicates
     from a series of arrays.

_U_s_a_g_e:

     duplicateCorrelation(object,design=rep(1,ncol(as.matrix(object))),ndups=2,spacing=1,block=NULL,trim=0.15,weights=NULL)

_A_r_g_u_m_e_n_t_s:

  object: a numeric matrix of expression values, or any data object
          from which 'as.matrix' will extract a suitable matrix such as
          an 'MAList', 'marrayNorm' or 'exprSet' object. If 'object' is
          an 'MAList' object then the arguments 'design', 'ndups',
          'spacing' and 'weights' will be extracted from it if
          available and do not have to be specified as arguments.
          Specifying these arguments explicitly will over-rule any
          components found in the data object.

  design: the design matrix of the microarray experiment, with rows
          corresponding to arrays and columns to comparisons to be
          estimated. The number of rows must match the number of
          columns of 'object'. Defaults to the unit vector meaning that
          the arrays are treated as replicates.

   ndups: a positive integer giving the number of times each gene is
          printed on an array. 'nrow(object)' must be divisible by
          'ndups'. Will be ignored if 'block' is specified.

 spacing: the spacing between the rows of 'object' corresponding to
          duplicate spots, 'spacing=1' for consecutive spots

   block: vector or factor specifying a blocking variable

    trim: the fraction of observations to be trimmed from each end of
          'tanh(all.correlations)' when computing the trimmed mean.

 weights: an optional numeric matrix of the same dimension as 'object'
          containing weights for each spot. If smaller than 'object'
          then it will be filled out the same size.

_D_e_t_a_i_l_s:

     When 'block=NULL', this function estimates the correlation between
     duplicate spots (regularly spaced within-array replicate spots).
     If 'block' is not null, this function estimates the correlation
     between repeated observations on the blocking variable. Typically
     the blocks are biological replicates and the repeated observations
     are technical replicates. In either case, the correlation is
     estimated by fitting a mixed linear model by REML individually for
     each gene. The function also returns a consensus correlation,
     which is a robust average of the individual correlations, which
     can be used as input for  functions 'lmFit' or 'gls.series'.

     At this time it is not possible to estimate correlations between
     duplicate spots and between technical replicates simultaneously.
     If 'block' is not null, then the function will set 'ndups=1'.

     For this function to return statistically useful results, there
     must be at least two more arrays than the number of coefficients
     to be estimated, i.e., two more than the column rank of 'design'.

     The function may take long time to execute as it fits a mixed
     linear model for each gene for an iterative algorithm. It is not
     uncommon for the function to return a small number of warning
     messages that correlation estimates cannot be computed for some
     individual genes. This is not a serious concern providing that
     there are only a few such warnings and the total number of genes
     is large. The consensus estimator computed by this function will
     not be materially affected by a small number of genes.

_V_a_l_u_e:

     A list with components 

consensus.correlation: the average estimated inter-duplicate
          correlation. The average is the trimmed mean of the
          individual correlations on the atanh-transformed scale.

     cor: same as 'consensus.correlation', for compatibility with
          earlier versions of the software

atanh.correlations: numeric vector of length 'nrow(object)/ndups'
          giving the individual genewise atanh-transformed
          correlations.

_A_u_t_h_o_r(_s):

     Gordon Smyth

_R_e_f_e_r_e_n_c_e_s:

     Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of
     within-array replicate spots for assessing differential expression
     in microarray experiments. _Bioinformatics_ 21(9), 2067-2075.
     <URL: http://www.statsci.org/smyth/pubs/dupcor.pdf>

_S_e_e _A_l_s_o:

     These functions use 'randomizedBlockFit' from the statmod package.

     An overview of linear model functions in limma is given by
     06.LinearModels.

_E_x_a_m_p_l_e_s:

     #  Also see lmFit examples

     ## Not run: 
     corfit <- duplicateCorrelation(MA, ndups=2, design)
     all.correlations <- tanh(corfit$atanh.correlations)
     boxplot(all.correlations)
     fit <- lmFit(MA, design, ndups=2, correlation=corfit$consensus)
     ## End(Not run)

