fitmaanova              package:maanova              R Documentation

_F_i_t _A_N_O_V_A _m_o_d_e_l _f_o_r _M_i_c_r_o _A_r_r_a_y _e_x_p_e_r_i_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     This is the function to fit the ANOVA model for Microarray
     experiment. Given the data and formula, this function fits the
     regression model for each gene and calculates the ANOVA estimates,
     variance components for random terms, fitted values, etc. For a
     mixed effect models, the output estimates will be BLUE and BLUP.

     All terms used in the formula should be corresponding to the
     factor names in designfile except "Spot" and "Label". "Spot"
     represents the spotting effect and "Label" represents the labeling
     effects. They are from the within slide technical replicates. If
     there is no replicated spots, These two terms cannot be fitted.
     Also these two terms cannot be fitted for one-dye system (e.g.,
     Affymetrix arrays). (Note that Dye effect should not be fitted in
     one-dye system). 

     A typical formula will be like "~Array+Dye+Sample", which means
     you want to fit Array, Dye and Sample effect in the ANOVA model.
     In this case, you need to have Array, Dye and Sample columns in
     your input design file. Make sure you have enough degree of
     freedom when making a model. Also you need to be careful about
     confounding problem.

     If you have multiple factors in your experiment, you can specify
     the main and interaction effect in the formula. At this time, only
     two-way interactions are allowed.

     When you have random or covariate effect they should be specified
     in the 'random' and 'covariate', and also in the formula.

     For most mixed effect models, Array should be treated as random
     factor. Sample should be treated as random if you have biological
     replicates. Note that the reference sample (0's in Sample) will
     always be treated as fixed even if you specify Sample as random.

     Note that the calculation could be very slow for mixed effect
     models. The computational time depends on the number of genes,
     number of arrays and the size of the random variables (dimension
     of Z matrix). 

     Array specific covariate should be included in the design matrix,
     and gene specific covariate  should be read by 'covM' in
     'read.madata()', and need to be specified in covariate term.

_U_s_a_g_e:

     fitmaanova(madata, formula, random= ~1, covariate = ~1, mamodel,
                inits20,method=c("REML","ML","MINQE-I","MINQE-UI", "noest"),
                verbose=TRUE, subCol=FALSE)

_A_r_g_u_m_e_n_t_s:

  madata: An object of class 'madata'.

 formula: The ANOVA model formula.

  random: The formula for random terms. ~1 means only the residual is
          random (fixed model). Note that all random terms should be in
          the ANOVA model formula.

covariate: The formula for covariates. ~1 means no covariates. The
          array specific covariates should be numeric values in the
          design matrix,  and the gene specific covariates should be
          read by 'covM' in  'read.madata'

 mamodel: Inside arguments to save the calculation time.

 inits20: The initial value for variance components. This should be a
          matrix with number of rows equals to the number of genes and
          number of columns equals to the number of random terms in the
          model. Good initial values will greatly speed up the
          calculation. If it is not given, it will be calculated based
          on the corresponding fixed model.

  method: The method used to solve the Mixed Model Equation. Available
          options includes: "ML" for maximum liklihood; "REML" for
          restricted maximum liklihood; "MINQE-I" and "MINQE-UI" are
          for minimum norm and "noest" for no estimate for variance
          component (use the initial value). Both "ML" and "REML" use
          method of scoring algorithm to solve MME iteratively. "noest"
          skips the iteration and will be significantly faster (but
          accurate). Default method is "REML". For details about
          fitting mixed effects models, read the "Fitting mixed Effects
          model" section. 

 verbose: A logical value to indicate whether to display some message
          for calculation progress.

  subCol: A logical value to indicate whether subtracting column mean
          from the raw data or not. Default is not subtracting column
          mean but for two color array it automatically subtracts the
          column mean.

_V_a_l_u_e:

     It returns 'anova' and 'anova.subcol'. Depending on 'subCol'
     option, one field may not contain any information. Still it needs
     two fields to calculate Fss test statistics. 'anova' and
     'anova.subcol' contains the same following fields. 

    yhat: Fitted intensity value which has the same dimension as the
          input intensity data

      S2: Variance components for the random terms. It is a matrix with
          number of rows equals to the number of genes and number of
          columns equals to the number of random terms. Note that for
          fixed effect model, S2 is a one column vector for error's
          variance.

       G: Gene effects. A vector with the same length as the number of
          genes.

reference: The estimates for reference sample. If there is no reference
          sample specified in the design, this field will be absent in
          the output object.

S2.level: A list of strings to indicate the order of the S2 field. Note
          that the last column of S2 is always the error's variance.
          S2.level is only for the non-error terms. For example, if
          there are three columns in S2 and S2.level is c("Strain",
          "Diet"), then the three columns of S2 correspond to the
          variances of Strain, Diet and error respectively for each
          gene. 

  Others: Estimates (or BLUE/BLUP for mixed effect model) for the terms
          in model. There will be XXX.level field for each term
          representing the order of the estimates (similar to
          S2.level).

    flag: A vector to indicate whether there is bad spot for this 
          gene. 0 means no bad spot and 1 means has bad spot. If there
          is no flag information in input data, this field will not be
          available.

   model: The model object used for this fitting.

_F_i_t_t_i_n_g _m_i_x_e_d _E_f_f_e_c_t_s _m_o_d_e_l:

     Fitting mixed effects models needs a lot of computation. A good
     starting value for the variances is very important. This function
     first treats all random factors as fixed and fits a fixed effects
     model. Then variances for random factors are calculated and used
     as the initial values for mixed effects model fitting. 

     There are several methods available for fitting the mixed effects
     model. "noest" does not really fit the mixed effects model. It
     takes the initial variance and solve mixed model equations to get
     the estimates (BLUE and BLUP). "MINQE-I" and "MINQE-UI" are based
     on minimum norm unbiased estimators. It is can be thought as a
     first iterate solution of "ML" and "REML", respectively. "ML" and
     "REML" are based on maximum likelihood and restricted maximum
     likelihood. Both of them need to be solved iteratively so they are
     very slow to compute. For "ML" and "REML", a MINQUE estimates is
     used as the starting value.  "Method of scoring" is used as the
     iteratively algorithm to solve ML and REML. "Method of scoring"
     algorithm is similar to New-Raphson method except that it uses the
     expected value of Hessian (second derivative matrix of the
     objective function) instead of Hessian itself. Method of scoring
     is more robust to poor starting values and the Hessian is easier
     to calculate than Newton-Raphson.

     For more mathematical details please read Searle et al.

_A_u_t_h_o_r(_s):

     Hao Wu

_R_e_f_e_r_e_n_c_e_s:

     Kerr and Churchill(2001), Statistical design and the analysis of
     gene expression microarrays, _Genetical Research_, *77*:123-128.

     Kerr, Martin and Churchill(2000), Analysis of variance for gene
     expression microarray data, _Journal of Computational Biology_,
     *7*:819-837.

     Searle, Casella and McCulloch, _Variance Components_, John Wiley
     and sons, Inc.

_S_e_e _A_l_s_o:

     'makeModel', 'matest'

_E_x_a_m_p_l_e_s:

     ###################################
     # fixed model fitting
     ###################################
     # load in abf1 data
     data(abf1)
     ## Not run: 

     # fit model with random effect
     fit.full.mix <- fitmaanova(abf1, formula = ~Strain+Sample, 
        random = ~Sample)

     # this is to explain the usage of including covariate variable.
     # .CEL file is not included in the package, thus use can not use this. 
     # array specific covariate : add it to the design matrix 
     beforeRma <- ReadAffy() # suppose there are 18 arrays.
     rmaData <- rma(beforeRma)
     datafile <- exprs(rmaData)
     design.table=data.frame(Array=row.names(pData(beforeRma)))
     Strain = rep(c('Aj', 'B6', 'B6xAJ'), each=6) 
     Sample = rep(c(1:9), each=2) 
     Cov1 = sample(1:100,18) # this is artificial example 
     designfile.cov1 = cbind(design.table, Strain, Sample,Cov1) 
     data.cov1=read.madata(datafile, designfile=designfile.cov1) 
     fit.cov1 = fitmaanova(data.cov1,formula = ~Strain+Sample+Cov1, covariate = ~ Cov1) 

     # gene specific covariate - make artificial 'covM' matrix 
     covm = matrix(rnorm(length(datafile)), nrow=nrow(datafile)) 
     designfile.cov2 = cbind(design.table, Strain, Sample) 
     data.cov2=read.madata(datafile, designfile=designfile.cov2, covM=covm) 
     fit.cov2 = fitmaanova(data.cov2,formula = ~Strain+Sample+covM, covariate = ~ covM) ## End(Not run)

