em                  package:mclust                  R Documentation

_E_M _a_l_g_o_r_i_t_h_m _s_t_a_r_t_i_n_g _w_i_t_h _E-_s_t_e_p _f_o_r _p_a_r_a_m_e_t_e_r_i_z_e_d _M_V_N _m_i_x_t_u_r_e _m_o_d_e_l_s.

_D_e_s_c_r_i_p_t_i_o_n:

     Implements the EM algorithm for parameterized MVN mixture models,
     starting with the expectation step.

_U_s_a_g_e:

     em(modelName, data, mu, ...)

_A_r_g_u_m_e_n_t_s:

modelName: A character string indicating the model: 

           "E": equal variance  (one-dimensional) 
           "V": variable variance (one-dimensional) 

           "EII": spherical, equal volume 
           "VII": spherical, unequal volume 
           "EEI": diagonal, equal volume and shape
            "VEI": diagonal, varying volume, equal shape
            "EVI": diagonal, equal volume, varying shape 
           "VVI": diagonal, varying volume and shape 
           "EEE": ellipsoidal, equal volume, shape, and orientation
            "EEV": ellipsoidal, equal volume and equal shape
           "VEV": ellipsoidal, equal shape 
           "VVV": ellipsoidal, varying volume, shape, and orientation 

    data: A numeric vector, matrix, or data frame of observations.
          Categorical variables are not allowed. If a matrix or data
          frame, rows correspond to observations and columns correspond
          to variables.  

      mu: The mean for each component. If there is more than one
          component, 'mu' is a matrix whose columns are the means of
          the components. 

     ...: Arguments for model-specific em functions. Specifically:

             *  An argument describing the variance (depends on the
                model):

             _s_i_g_m_a_s_q for the one-dimensional models ("E", "V") and
                  spherical models ("EII", "VII"). This is either a
                  vector whose _k_th component is the variance for the
                  _k_th component in the mixture model ("V" and "VII"),
                  or a scalar giving the common variance for all
                  components in the mixture model ("E" and "EII").

             _d_e_c_o_m_p for the diagonal models ("EEI", "VEI", "EVI",
                  "VVI") and some ellipsoidal models ("EEV", "VEV").
                  For a description, see 'cdens'.

             _S_i_g_m_a for the equal variance model "EEE". A _d_ by _d_
                  matrix giving the common covariance for all
                  components of the mixture model.

             _s_i_g_m_a for the unconstrained variance model "VVV". A _d_ by
                  _d_ by _G_ matrix array whose '[,,k]'th entry is the
                  covariance matrix for the _k_th component of the
                  mixture model.

                  The form of the variance specification is the same as
                  for the output for the 'em', 'me', or 'mstep' methods
                  for the specified mixture model. 


             *  'pro': Mixing proportions for the components of the
                mixture. There should one more mixing proportion than
                the number of MVN components if the mixture model
                includes a  Poisson noise term. 

             *  'eps': A scalar tolerance for deciding when to
                terminate computations due to computational singularity
                in covariances. Smaller values of 'eps' allow
                computations to proceed nearer to singularity. The
                default is '.Mclust\$eps'. 

                For those models with iterative M-step ("VEI", "VEV"),
                two values can be entered for 'eps', in which case the
                second value is used for determining singularity in the
                M-step. 

             *  'tol': A scalar tolerance for relative convergence of
                the loglikelihood.  The default is '.Mclust\$tol'.

                For those models with iterative M-step ("VEI", "VEV"),
                two values can be entered for 'tol', in which case the
                second value governs parameter convergence in the
                M-step. 

             *  'itmax': An integer limit on the number of EM
                iterations.  The default is '.Mclust\$itmax'.

                For those models with iterative M-step ("VEI", "VEV"),
                two values can be entered for 'itmax', in which case
                the second value is an upper limit on the number of
                iterations in the M-step. 

             *  'equalPro': Logical variable indicating whether or not
                the mixing proportions are equal in the model. The
                default is '.Mclust\$equalPro'.

             *  'warnSingular': A logical value indicating whether or
                not a warning should be issued whenever a singularity
                is encountered. The default is '.Mclust\$warnSingular'.

             *  'Vinv': An estimate of the reciprocal hypervolume of
                the data region. The default is determined by applying
                function  'hypvol' to the data. Used only when 'pro'
                includes an additional mixing proportion for a noise
                component.

_D_e_t_a_i_l_s:

     This function can be used with an indirect or list call using
     'do.call', allowing the output of e.g. 'mstep' to be passed
     without the need to specify individual parameters as arguments.

_V_a_l_u_e:

     A list including the following components: 

       z: A matrix whose '[i,k]'th entry is the conditional probability
          of the _i_th observation belonging to the _k_th component of
          the mixture.    

  loglik: The logliklihood for the data in the mixture model.  

      mu: A matrix whose kth column is the mean of the _k_th component
          of the mixture model. 

   sigma: For multidimensional models, a three dimensional array  in
          which the '[,,k]'th entry gives the the covariance for the
          _k_th group in the best model. <br> For one-dimensional
          models, either a scalar giving a common variance for the
          groups or a vector whose entries are the variances for each
          group in the best model. 

     pro: A vector whose _k_th component is the mixing proportion for
          the _k_th component of the mixture model. 

modelName: A character string identifying the model (same as the input
          argument). 

             *  '"info"': Information on the iteration.

             *  '"warn"': An appropriate warning if problems are
                encountered in the computations.

_R_e_f_e_r_e_n_c_e_s:

     C. Fraley and A. E. Raftery (2002a). Model-based clustering,
     discriminant analysis, and density estimation. _Journal of the
     American Statistical Association 97:611-631_.  See <URL:
     http://www.stat.washington.edu/mclust>. 

     C. Fraley and A. E. Raftery (2002b). MCLUST:Software for
     model-based clustering, density estimation and  discriminant
     analysis.  Technical Report, Department of Statistics, University
     of Washington.  See <URL: http://www.stat.washington.edu/mclust>.

_S_e_e _A_l_s_o:

     'emE', ..., 'emVVV', 'estep', 'me', 'mstep', 'mclustOptions',
     'do.call'

_E_x_a_m_p_l_e_s:

     data(iris)
     irisMatrix <- as.matrix(iris[,1:4])
     irisClass <- iris[,5]
      
     msEst <- mstep(modelName = "EEE", data = irisMatrix, 
                    z = unmap(irisClass))
     names(msEst)

     em(modelName = msEst$modelName, data = irisMatrix,
        mu = msEst$mu, Sigma = msEst$Sigma, pro = msEst$pro)
     ## Not run: 
     do.call("em", c(list(data = irisMatrix), msEst))   ## alternative call
     ## End(Not run)

