dksTrain               package:dualKS               R Documentation

_P_e_r_f_o_r_m _D_u_a_l _K_S _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function will perform dual KS discriminant analysis on a 
     training set of gene expression data (in the form of an 
     ExpressionSet) and a vector of classes describing which of  (two
     or more) classes each column of data corresponds to.  Genes  will
     be be ranked based on the degree to which they are  upregulated or
     downregulated in each class, or both.   Discriminant gene
     signatures are then extracted using  dksSelectGenes and applied to
     new samples with dksClassify.

_U_s_a_g_e:

             dksTrain(eset, class, type = "up", verbose=FALSE, weights=FALSE, logweights=TRUE, method='kort')

_A_r_g_u_m_e_n_t_s:

    eset: Gene expression data in the form of an  'ExpressionSet' or
          'matrix'

   class: A factor with two or more levels indicating which  class each
          sample in the expression set belongs OR  an integer
          indicating which column of pData(eset)  contains this
          information.

    type: One of "up", "down", or "both" indicating whether you  want
          to analyze and classify based on up or down  regulated genes,
          or both (note that classification of  samples based on down
          regulated genes from single  color experiments should be
          expected to work well due  to the noise at low expression
          levels.  Therefore,  'down', or 'both' should only be used
          for two color  experiments or one color data that has been
          converted  to ratios based on some reference sample(s).)

 verbose: Set to TRUE if you want more evidence of progress while data
          is being processed.  Set to FALSE if you  want your CPU
          cycles to be used on analysis and not  printing messages.

 weights: Value determines whether and how genes are weighted  when
          building the signatures.  See details.

logweights: Should the weights be log10 transformed prior to applying?

  method: Two methods are supported.  The 'kort' method returns  the
          maximum of the running sum.  The 'yang' method  returns the
          sum of the maximum and the minimum of the  running sum,
          thereby penalizing genes that are highly enriched in a subset
          of samples of a given class, but highly  down regulated in
          another subset of that same class.

_D_e_t_a_i_l_s:

     This function calculates the Kolmogorov-Smirnov rank sum statistic
     for  each gene and each level of 'class'.  The highest scoring
     genes can  then be extracted for use in classification.

     If weights=FALSE, signatures are defined based on the ranks of
     members  of each class when sorted on each gene.  Those genes for
     which a given  class has the highest rank when sorting samples by
     those genes will  be included in the classifier, with no regard to
     the absolute expression  level of those genes.  This is the
     classic KS statistic.

     Very discriminant genes identified in this way may or may not be
     the  highest expressed genes.  The result is that signatures
     identified  in this way have arbitrary "baseline" values.  This
     may lead to  misclassification when comparing two signatures
     (using, for example,  'dksClassify').  Therefore, one may wish to
     weight genes  based on absolute expression level, or some other
     metric.

     Setting 'weights = TRUE' causes the genes to be weighted according
      to the log (base 10) of the relative rank of the mean expression
     of  each gene in each class.  Alternatively, you may provide your
     own weight  matrix as the argument to 'weights'.  This matrix must
     have one  column for each possible value of 'class', and one row
     for each  gene in 'eset'.  Note that for 'type='down'' or the down
      component of 'type='both'', the weight matrix will be inverted 
     as '1-matrix', so the range of weights should be 0 - 1 for each 
     class.  NAs are handled "gracefully" by discarding any  genes for
     which any column of the corresponding row of 'weights'  is NA. 
     Our experience has been that weights that are a linear function 
     of some feature of the gene expression (like mean) can be too
     subtle.  The  effect of the weights can be increased by setting
     'logweights=TRUE'  (which is the default).

_V_a_l_u_e:

     An object of class 'DKSGeneScores'.

_A_u_t_h_o_r(_s):

     Eric J. Kort, Yarong Yang

_S_e_e _A_l_s_o:

     'dksTrain', 'dksSelectGenes', 'dksClassify', 'DKSGeneScores', 
     'DKSPredicted',  'DKSClassifier'

_E_x_a_m_p_l_e_s:

             data("dks")
             tr <- dksTrain(eset, 1, "up")
             cl <- dksSelectGenes(tr, 100)
             pr <- dksClassify(eset, cl)
             summary(pr, pData(eset)[,1])
             show(pr)
             plot(pr, actual=pData(eset)[,1])        

