pdmGenes              package:pdmclass              R Documentation

_A _F_u_n_c_t_i_o_n _t_o _o_u_t_p_u_t _t_h_e _T_o_p _R_a_n_k_e_d _G_e_n_e_s _f_r_o_m _a _P_e_n_a_l_i_z_e_d
_D_i_s_c_r_i_m_i_n_a_n_t _C_l_a_s_s_i_f_i_e_r

_D_e_s_c_r_i_p_t_i_o_n:

     After fitting a classifier, it is often desirable to output the
     most "interesting" genes for further validation. This function
     will output the top 'n' genes that discriminate between each
     class, along with an estimate of the stability of the observed
     rankings (see details for more information).

_U_s_a_g_e:

     pdmGenes(formula = formula(data), method = c("pls", "pcr", "ridge"),
     data = sys.frame(sys.parent()), weights, theta, dimension = J - 1,
     eps = .Machine$double.eps, genelist = NULL, list.length = NULL, B = 100, ...)

_A_r_g_u_m_e_n_t_s:

 formula: A symbolic description of the model to be fit. Details given
          below. 

  method: One of "pls", "pcr", "ridge", corresponding to partial least
          squares, principal components regression and ridge
          regression.

    data: An optional data.frame that contains the variables in the
          model. If not found in 'data', the variables are taken from
          'environment(formula)', typically the environment from which
          'pdmClass' is called. Note that unlike most microarray
          analyses, in this case rows are samples and columns are
          genes.

 weights: An optional vector of sample weights. Defaults to 1. 

   theta: An optional matrix of class scores, typically with less than
          J - 1 columns.

dimension: The dimension of the solution, no greater than J - 1, where
          J is the number of classes. Defaults to J - 1. 

     eps: A threshold for excluding small discriminant variables.
          Defaults to '.Machine$double.eps'.

genelist: A vector of gene names, one per gene. 

list.length: The number of 'top' genes to output. 

       B: The number of bootstrap samples to use for estimating
          stability. Defaults to 100. More than this may take an
          inordinate amount of time.

     ...: Additional parameters to pass to 'method'. 

_D_e_t_a_i_l_s:

     The formula interface is identical to all other formula calls in
     R, namely Y ~ X, where Y is a numeric vector of class assignments
     and X is a matrix or data.frame containing the gene expression
     values. Note that unlike most microarray analyses, in this
     instance the columns of X are genes and rows are samples, so most
     calls will require something similar to Y ~ t(X).

     The dimension of the solution is typically J - 1, where J is the
     number of classes. The model fit uses 'contr.treatment' contrasts,
     which means that all of the coefficients in the model are
     comparing the given class to a baseline class. Therefore, the
     genes listed are those that discriminate between a given class and
     the baseline. For instance, if there are three classes
     (characterized by a numeric vector of 1s, 2s, and 3s), then there
     will be two sets of 'top genes'. The first set will be those genes
     that discriminate between class 2 and class 1, whereas the second
     set will be the genes that discriminate between class 3 and class
     1. The 'Y' vector will therefore need to be constructed to give
     the comparisons of interest.

_V_a_l_u_e:

     A list containing a 'data.frame' for each comparison. The first
     column of each 'data.frame' contains the gene names, and the
     second column contains the frequency that the gene was observed in
     the bootstrapped samples.

_A_u_t_h_o_r(_s):

     James W. MacDonald and Debashis Ghosh. Partial least squares and
     principal components regression based on code written by Mike
     Denham and contributed to StatLib. Model fit based on code from
     the 'mda' package written by Trevor Hastie and Robert Tibshirani
     and ported to R by Kurt Hornik, Brian D. Ripley, and Friedrich
     Leisch.

_R_e_f_e_r_e_n_c_e_s:

     http://www.sph.umich.edu/~ghoshd/COMPBIO/POPTSCORE

_E_x_a_m_p_l_e_s:

     library(fibroEset)
     data(fibroEset)
     y <- as.factor(pData(fibroEset)[,2])
     x <- t(exprs(fibroEset))
     genes <- featureNames(fibroEset)
     pdmGenes(y ~ x, genelist = genes, list.length = 25, B = 10)

