Q2            package:pcaMethods            R Documentation(latin1)

_P_e_r_f_o_r_m _i_n_t_e_r_n_a_l _c_r_o_s_s-_v_a_l_i_d_a_t_i_o_n _f_o_r _P_C_A

_D_e_s_c_r_i_p_t_i_o_n:

     Internal cross-validation can be used for estimating the level of
     structure in a data set and to optimise the choice of number of
     principal components.

_U_s_a_g_e:

     Q2(object, originalData, nPcs=object@nPcs, fold=5, nruncv=10,
     segments=NULL, verbose=interactive(), ...)

_A_r_g_u_m_e_n_t_s:

  object: A 'pcaRes' object (result from previous PCA analysis.)

originalData: The matrix used to obtain the pcaRes object

    nPcs: The amount of principal components to estimate Q2 for.

    fold: The amount of groups to divide the data in.

  nruncv: The amount of times to repeat the whole cross-validation

segments: 'list' A predefined list where each element is the set of
          indices to leave out. Note that if this is provided, Q2
          becomes deterministic (if the PCA is deterministic of
          course).

 verbose: 'boolean' If TRUE Q2 outputs a primitive progress bar.

     ...: Further arguments passed to the pca() function called within
          Q2

_D_e_t_a_i_l_s:

     This method calculates Q^2 for a PCA model. This is the predictory
     version of R^2 and can be interpreted as the ratio of variance in
     a left out data chunk that can be estimated by the PCA model. Poor
     (low) Q^2 means that the PCA model only describes noise and that
     the model is unrelated to the true data structure. The definition
     of Q^2 is:


   Q^2 = 1 - sum_i^k sum_j^n (x - hat{x})^2 / sum_i^k sum_j^n(x^2)


     for the matrix x which has n rows and k columns. For a given
     amount of PC's x is estimated as hat{x} = TP' (T are scores and P
     are loadings). Though this defines the leave-one-out
     cross-validation this is  not what is performed if fold is less
     than the amount of rows and/or columns.

     Diagonal rows of elements in the matrix are deleted and the
     re-estimated. You can choose your own segmentation as well make
     sure no complete row or column is lost.

_V_a_l_u_e:

     A matrix with Q^2 estimates.

_A_u_t_h_o_r(_s):

     Wolfram Stacklies, Henning Redestig

_R_e_f_e_r_e_n_c_e_s:

     Wold, H. (1966) Estimation of principal components and related
     models by iterative least squares. In Multivariate Analysis (Ed.,
     P.R. Krishnaiah), Academic Press, NY, 391-420.

_S_e_e _A_l_s_o:

     'pca'

_E_x_a_m_p_l_e_s:

     data(iris)
     pcIr <- pca(iris[,1:4], nPcs=2, method="ppca")
     #can only get Q2 estimats for the two first PC's
     q2 <- Q2(pcIr, iris[,1:4], nruncv=2)
     #Typically Q2 increases only very slowly after the optimal amount of PC's
     boxplot(q2~row(q2), xlab="Amount of PC's", ylab=expression(Q^2))

