fitGG                  package:gaga                  R Documentation

_F_i_t _G_a_G_a _h_i_e_r_a_r_c_h_i_c_a_l _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Fits GaGa or MiGaGa hierarchical models, either via a fully
     Bayesian approach or via maximum likelihood.

_U_s_a_g_e:

     fitGG(x, groups, patterns, equalcv = TRUE, nclust = 1, method = "quickEM", B, priorpar, parini, trace = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: 'ExpressionSet', 'exprSet', data frame or matrix containing
          the gene expression measurements used to fit the model.

  groups: If 'x' is of type 'ExpressionSet' or 'exprSet', 'groups'
          should be the name of the column in 'pData(x)' with the
          groups that one wishes to compare. If 'x' is a matrix or a
          data frame, 'groups' should be a vector indicating to which
          group each column in x corresponds to.

patterns: Matrix indicating which groups are put together under each
          pattern, i.e. the hypotheses to consider for each gene.
          'colnames(patterns)' must match the group levels specified in
          'groups'. Defaults to two hypotheses: null hypothesis of all
          groups being equal and full alternative of all groups being
          different.

 equalcv: 'equalcv==TRUE' fits model assuming constant CV across
          groups. 'equalcv==FALSE' compares cv as well as mean
          expression levels between groups

  nclust: Number of clusters in the MiGaGa model. 'nclust' corresponds
          to the GaGa model. 

  method: 'method=='MH'' fits a fully Bayesian model via
          Metropolis-Hastings posterior sampling. 'method=='Gibbs''
          does the same using Gibbs sampling. 'method=='SA'' uses
          Simulated Annealing to find the posterior mode.
          'method=='EM'' finds maximum-likelihood estimates via the
          expectation-maximization algorithm, but this is currently
          only implemented for 'nclust>1'. 'method=='quickEM'' is a
          quicker implementation that only performs 2 optimization
          steps (see details).

       B: Number of iterations. For 'method=='MH'' and
          'method=='Gibbs'', 'B' is the number of MCMC iterations
          (defaults to 1000). For 'method=='SA'', 'B' is the number of
          iterations in the Simulated Annealing scheme (defaults to
          200). For 'method=='EM'', 'B' is the maximum number of
          iterations (defaults to 20). 

priorpar: List with prior parameter values. It must have components
          'a.alpha0,b.alpha0,a.nu,b.nu,a.balpha,b.balpha,a.nualpha,b.nualpha,p.probclus'
          and 'p.probpat'. If missing they are set to non-informative
          values that are usually reasonable for RMA and GCRMA
          normalized data.

  parini: list with components 'a0', 'nu', 'balpha', 'nualpha',
          'probclus' and 'probpat' indicating the starting values for
          the hyper-parameters. If not specified, a method of moments
          estimate is used.

   trace: For 'trace==TRUE' the progress of the model fitting routine
          is printed.

_D_e_t_a_i_l_s:

     An approximation is used to sample faster from the posterior
     distribution of the gamma shape parameters and to compute the
     normalization constants (needed to evaluate the likelihood). These
     approximations are implemented in 'rcgamma' and 'mcgamma'.

     The cooling scheme in 'method=='SA'' uses a temperature equal to
     '1/log(1+i)', where 'i' is the iteration number.

     The EM implementation in 'method=='quickEM'' is a quick EM
     algorithm that usually delivers hyper-parameter estimates very
     similar to those obtained via the slower 'method=='EM''.
     Additionally, the GaGa model inference has been seen to be robust
     to moderate changes in the hyper-parameter estimates in most
     datasets.

_V_a_l_u_e:

     An object of class 'gagafit', with components 

 parest : Hyper-parameter estimates. Only returned if
          'method=='EBayes'', for 'method=='Bayes'' one must call the
          function 'parest' after 'fitGG'

   mcmc : Object of class 'mcmc' with posterior draws for
          hyper-parameters. Only returned if 'method=='Bayes''.

   lhood: For 'method=='Bayes'' it is the log-likelihood evaluated at
          each MCMC iteration. For 'method=='EBayes'' it is the
          log-likelihood evaluated at the maximum.

  nclust: Same as input argument.

patterns: Same as input argument, converted to object of class
          'gagahyp'.

_A_u_t_h_o_r(_s):

     David Rossell

_R_e_f_e_r_e_n_c_e_s:

     Rossell D. GaGa: a simple and  flexible hierarchical model for
     microarray data analysis. <URL:
     http://rosselldavid.googlepages.com>.

_S_e_e _A_l_s_o:

     'parest' to estimate hyper-parameters and compute posterior
     probabilities after a GaGa or MiGaGa fit. 'findgenes' to find
     differentially expressed genes. 'classpred' to predict the group
     that a new sample belongs to.

_E_x_a_m_p_l_e_s:

     library(gaga)
     set.seed(10)
     n <- 100; m <- c(6,6)
     a0 <- 25.5; nu <- 0.109
     balpha <- 1.183; nualpha <- 1683
     probpat <- c(.95,.05)
     xsim <- simGG(n,m,p.de=probpat[2],a0,nu,balpha,nualpha,equalcv=TRUE)
     x <- exprs(xsim)

     #Frequentist fit: EM algorithm to obtain MLE
     groups <- pData(xsim)$group[c(-6,-12)]
     patterns <- matrix(c(0,0,0,1),2,2)
     colnames(patterns) <- c('group 1','group 2')
     gg1 <- fitGG(x[,c(-6,-12)],groups,patterns=patterns,method='EM',trace=FALSE)  
     gg1 <- parest(gg1,x=x[,c(-6,-12)],groups)
     gg1

