bpca           package:pcaMethods           R Documentation(latin1)

_B_a_y_e_s_i_a_n _P_C_A _M_i_s_s_i_n_g _V_a_l_u_e _E_s_t_i_m_a_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Implements a Bayesian PCA missing value estimator. The script is a
     port of the Matlab version provided by Shigeyuki OBA. See also
     <URL: http://hawaii.aist-nara.ac.jp/%7Eshige-o/tools/>.
      BPCA combines an EM approach for PCA with a Bayesian model. In
     standard PCA data far from the training set but close to the
     principal subspace may have the same reconstruction error. BPCA
     defines a likelihood function such that the likelihood for data
     far from the training set is much lower, even if they are close to
     the principal subspace.

     Scores and loadings obtained with Bayesian PCA slightly differ
     from those obtained with conventional PCA. This is because BPCA
     was developed especially for missing value estimation. The
     algorithm does not force orthogonality between factor loadings, as
     a result factor loadings are not necessarily orthogonal. However,
     the BPCA authors found that including an orthogonality criterion
     made the predictions worse.
      The authors also state that the difference between real and
     predicted Eigenvalues becomes larger when the number of
     observation is smaller, because it reflects the lack of
     information to accurately determine true factor loadings from the
     limited and noisy data. As a result, weights of factors to predict
     missing values are not the same as with conventional PCA, buth the
     missing value estimation is improved.

     BPCA works iteratively, the complexity is growing with O(n^3)
     because several matrix inversions are required. The size of the
     matrices to invert depends on the number of components used for
     re-estimation.
      Finding the optimal number of components for estimation is not a
     trivial task; the best choice depends on the internal structure of
     the data. A method called 'kEstimate' is provided to estimate the
     optimal number of components via cross validation. In general few
     components are sufficient for reasonable estimation accuracy. See
     also the package documentation for further discussion about on
     what data PCA-based missing value estimation makes sense.

     Requires 'MASS'.

     It is not recommended to use this function directely but rather to
     use the pca() wrapper function.

_U_s_a_g_e:

             bpca(Matrix, nPcs = 2, completeObs = TRUE, maxSteps = 100, 
             verbose = interactive(), ...)

_A_r_g_u_m_e_n_t_s:

  Matrix: 'matrix' - Data containing the variables in columns and
          observations in rows. The data may contain missing values,
          denoted as 'NA'.

    nPcs: 'numeric' - Number of components used for re-estimation.
          Choosing few components may decrease the estimation
          precision.

completeObs: 'boolean' Return the complete observations if TRUE. This
          is the input data with NA values replaced by the estimated
          values.

maxSteps: 'numeric' - Maximum number of estimation steps. Default is
          100. 

 verbose: 'boolean' - BPCA prints the number of steps and the increase
          in precision if set to TRUE. Default is interactive().

     ...: Reserved for future use. Currently no further parameters are
          used

_D_e_t_a_i_l_s:

     Details about the probabilistic model underlying BPCA are found in
     Oba et. al 2003. The algorithm uses an expectation maximation
     approach together with a Bayesian model to approximate the
     principal axes (eigenvectors of the covariance matrix in PCA). The
     estimation is done iteratively, the algorithm terminates if either
     the maximum number of iterations was reached or if the estimated
     increase in precision falls below 1e^-4.

     *Complexity:* The relatively high complexity of the method is a
     result of several matrix inversions required in each step.
     Considering the case that the maximum number of iteration steps is
     needed, the approximate complexity is given by the term

                     maxSteps * row_miss * O(n^3)

     Where row_miss is the number of rows containing missing values and
     O(n^3) is the complexity for inverting a matrix of size
     components. Components is the number of components used for
     re-estimation.

_V_a_l_u_e:

  pcaRes: Standard PCA result object used by all PCA-based methods of
          this package. Contains scores, loadings, data mean and more.
          See 'pcaRes' for details.

_A_u_t_h_o_r(_s):

     Wolfram Stacklies 
       Max Planck Institut fuer Molekulare Pflanzenphysiologie,
     Potsdam, Germany 
      wolfram.stacklies@gmail.com 

_R_e_f_e_r_e_n_c_e_s:

     Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden,
     Ken-ichi Matsubara and Shin Ishii. A Bayesian missing value
     estimation method for gene expression profile data.
     _Bioinformatics, 19(16):2088-2096, Nov 2003_.

_S_e_e _A_l_s_o:

     'ppca, svdImpute, prcomp, nipalsPca, pca, pcaRes. kEstimate'.

_E_x_a_m_p_l_e_s:

     ## Load a sample metabolite dataset with 5% missig values (metaboliteData)
     data(metaboliteData)

     ## Perform Bayesian PCA with 2 components
     result <- pca(metaboliteData, method="bpca", nPcs=2, center=FALSE)

     ## Get the estimated principal axes (loadings)
     loadings <- result@loadings

     ## Get the estimated scores
     scores <- result@scores

     ## Get the estimated complete observations
     cObs <- result@completeObs

     ## Now make a scores and loadings plot
     slplot(result)

