varSel.highest.var        package:MCRestimate        R Documentation

_V_a_r_i_a_b_l_e _s_e_l_e_c_t_i_o_n _a_n_d _c_l_u_s_t_e_r _f_u_n_c_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     Different functions for a variable selection and clustering
     methods. These functions are mainly used for the function
     'MCRestimate'

_U_s_a_g_e:

     identity(sample.gene.matrix,classfactor,...)
            varSel.highest.t.stat(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=500,...)

            varSel.highest.var(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=2000,...)

            varSel.AUC(sample.gene.matrix, classfactor, theParameter=NULL,var.numbers=200,...)
            cluster.kmeans.mean(sample.gene.matrix,classfactor,theParameter=NULL,number.clusters=500,...)

            varSel.removeManyNA(sample.gene.matrix,classfactor, theParameter=NULL, NAthreshold=0.25,...)
            varSel.impute.NA(sample.gene.matrix ,classfactor,theParameter=NULL,...)

_A_r_g_u_m_e_n_t_s:

sample.gene.matrix: a matrix in which the rows corresponds to genes and
          the colums corresponds to samples

classfactor: a factor containing the values that should be predicted

theParameter: Parameter that depends on the function. For
          'cluster.kmeans.mean' eighter NULL or an output of the
          function 'kmeans'. If it is NULL then 'kmeans' will be used
          to form clusters of the genes. Otherwise the already existing
          clusters will be used. In both ways there will be a
          calculation of the metagene intensities afterwards. For the
          other functions eighter NULL or a logical vector which
          indicates for every gene if it sould be left out from further
          analysis or not

number.clusters: parameter which specifies the number of clusters

var.numbers: some methods needs an argument which specifies how many
          variables should be taken

NAthreshold: integer- if the percentage of the NA is higher than this
          threshold the variable will be deleted

     ...: Further parameters

_D_e_t_a_i_l_s:

     'metagene.kmeans.mean' performes a kmeans clustering with a number
     of clusters specified by 'number clusters' and takes the mean of
     each cluster. 'varSel.highest.var' selects a number (specified by
     'var.numbers') of variables  with the highest variance.
     'varSel.AUC' chooses the most discriminating variables due to the
     AUC criterium (the library 'ROC' is required).

_V_a_l_u_e:

     Every function returns a list consisting of two arguments: 

  matrix: the result matrix of the variable redution or the clustering

parameter: The parameter which are used to reproduce the algorithm,
          i.e. a vector which indicates for every gene if it will be
          left out from further analysis or not if a gene reduction is
          performed or the output of the function kmeans for the
          clustering algorithm.

_A_u_t_h_o_r(_s):

     Markus Ruschhaupt <URL: mailto:m.ruschhaupt@dkfz.de>

_S_e_e _A_l_s_o:

     'MCRestimate'

_E_x_a_m_p_l_e_s:

     library(MCRestimate)
     m <- matrix(c(rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,7,0.5),rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,2,0.5)),ncol=2)
     cluster.kmeans.mean(m ,number.clusters=3)

