EMclustN               package:mclust               R Documentation

_B_I_C _f_o_r _M_o_d_e_l-_B_a_s_e_d _C_l_u_s_t_e_r_i_n_g _w_i_t_h _P_o_i_s_s_o_n _N_o_i_s_e

_D_e_s_c_r_i_p_t_i_o_n:

     BIC for EM initialized by hierarchical clustering for
     parameterized Gaussian mixture models with Poisson noise.

_U_s_a_g_e:

     EMclustN(data, G, emModelNames, noise, hcPairs, eps, tol, itmax, 
              equalPro, warnSingular=FALSE, Vinv, ...)

_A_r_g_u_m_e_n_t_s:

    data: A numeric vector, matrix, or data frame of observations.
          Categorical variables are not allowed. If a matrix or data
          frame, rows correspond to observations and columns correspond
          to variables.  

       G: An integer vector specifying the numbers of MVN (Gaussian)
          mixture components (clusters) for which the BIC is to be
          calculated. The default is '0:9' where '0' indicates only a
          noise component.  

emModelNames: A vector of character strings indicating the models to be
          fitted  in the EM phase of clustering. Possible models: 

               "E" for spherical, equal variance (one-dimensional) 
           "V" for spherical, variable variance (one-dimensional) 
           "EII": spherical, equal volume 
           "VII": spherical, unequal volume 
           "EEI": diagonal, equal volume, equal shape 
           "VEI": diagonal, varying volume, equal shape 
           "EVI": diagonal, equal volume, varying shape 
           "VVI": diagonal, varying volume, varying shape 
           "EEE": ellipsoidal, equal volume, shape, and orientation 
           "EEV": ellipsoidal, equal volume and equal shape
           "VEV": ellipsoidal, equal shape 
           "VVV": ellipsoidal, varying volume, shape, and orientation 

               The default is '.Mclust\$emModelNames'. 

   noise: A logical or numeric vector indicating whether or not
          observations are initially estimated to noise in the data. If
          there is no noise 'EMclust' should be use rather than
          'EMclustN'. 

 hcPairs: A matrix of merge pairs for hierarchical clustering such as
          produced by function 'hc'. The default is to compute a
          hierarchical clustering tree by applying function 'hc' with
          'modelName = .Mclust\$hcModelName[1]' to univariate data and
          'modelName = .Mclust\$hcModelName[2]' to multivariate data or
          a subset as indicated by the 'subset' argument. The
          hierarchical clustering results are used as starting values
          for EM.   

     eps: A scalar tolerance for deciding when to terminate
          computations due to computational singularity in covariances.
          Smaller values of 'eps' allow computations to proceed nearer
          to singularity. The default is '.Mclust\$eps'.  

     tol: A scalar tolerance for relative convergence of the
          loglikelihood.  The default is '.Mclust\$tol'. 

   itmax: An integer limit on the number of EM iterations.  The default
          is '.Mclust\$itmax'. 

equalPro: Logical variable indicating whether or not the mixing
          proportions are equal in the model. The default is
          '.Mclust\$equalPro'. 

    Vinv: An estimate of the reciprocal hypervolume of the data region.
          The default is determined by applying function  'hypvol' to
          the data. 

warnSingular: A logical value indicating whether or not a warning
          should be issued whenever a singularity is encountered. The
          default is 'warnSingular=FALSE'. 

    ... : Provided to allow lists with elements other than the
          arguments can be passed in indirect or list calls with
          'do.call'. 

_V_a_l_u_e:

     Bayesian Information Criterion for the specified mixture models
     numbers of clusters. Auxiliary information returned as attributes.

_R_e_f_e_r_e_n_c_e_s:

     C. Fraley and A. E. Raftery (2002a). Model-based clustering,
     discriminant analysis, and density estimation. _Journal of the
     American Statistical Association 97:611-631_.  See <URL:
     http://www.stat.washington.edu/mclust>.

     C. Fraley and A. E. Raftery (2002b). MCLUST:Software for
     model-based clustering, density estimation and discriminant
     analysis.  Technical Report, Department of Statistics, University
     of Washington.  See <URL: http://www.stat.washington.edu/mclust>.

_S_e_e _A_l_s_o:

     'summary.EMclustN',  'EMclust',  'hc', 'me', 'mclustOptions'

_E_x_a_m_p_l_e_s:

     data(iris)
     irisMatrix <- as.matrix(iris[,1:4])
     irisClass <- iris[,5]

     b <- apply( irisMatrix, 2, range)
     n <- 450
     set.seed(0)
     poissonNoise <- apply(b, 2, function(x, n=n) 
                           runif(n, min = x[1]-0.1, max = x[2]+.1), n = n)
     set.seed(0)
     noiseInit <- sample(c(TRUE,FALSE),size=150+450,replace=TRUE,prob=c(3,1))
     Bic <-  EMclustN(data=rbind(irisMatrix, poissonNoise), noise = noiseInit)
     Bic
     plot(Bic)

