eddObsolete               package:edd               R Documentation

_e_x_p_r_e_s_s_i_o_n _d_e_n_s_i_t_y _d_i_a_g_n_o_s_t_i_c_s:

_D_e_s_c_r_i_p_t_i_o_n:

     classify cohort distributions of gene expression values

_U_s_a_g_e:

     eddObsolete(eset, 
        ref=c("multiCand", "uniCand", "test", "nnet")[1], 
        k=10, l=6, nnsize=6, nniter=200)

_A_r_g_u_m_e_n_t_s:

    eset: instance of Biobase class exprSet 

     ref: one of 'multiCand', 'uniCand', 'test' or 'nnet'. see details.

       k: k setting for knn - number of nearest neighbors to poll 

       l: l setting for knn - minimum number of concordant assents 

  nnsize: size parameter for nnet 

  nniter: iter setting for nnet 

_D_e_t_a_i_l_s:

     Four options are available for classifying expression densities.
     Data on each gene are shifted and scaled to have median zero and
     mad 1. They are then compared to shapes of reference distributions
     (standard Gaussian, chisq(1), lognorm(0,1), t(3), .75N0,1+.25N4,1,
     .25N0,1+.75N4,1, Beta(2,8), Beta(8,2), U(0,1)) after each of these
     has been transformed to have median 0 and mad 1. Classification
     proceeds by one of four methods, selected by setting of the 'ref'
     argument.  Suppose there are S samples in the exprSet.

     multiCand - 100 samples of size S are drawn from each reference
     distribution and then scaled to med 0, mad 1.  The knn(k,l)
     procedure is used to classify the genes based on proximity to
     representatives in this set

     uniCand - one representative of size S is created from each
     reference distribution, using the theoretical quantiles. knn(1,0)
     is used to classify genes based on proximity to these
     representatives

     test - classification of each gene is based on maximum p-value of
     Kolmogorov-Smirnov tests vs each reference distribution.  If the
     p-value never exceeds .1, 'doubt' is declared

     nnet - 100 samples of size S are drawn from each reference
     distribution and then scaled to med 0, mad 1.  A neural net is fit
     to this dataset and the associated labels.  The net is then
     applied to the scaled gene expression data and the predictions are
     used for classification.

_V_a_l_u_e:

     the vector of classifications, with NAs for nonclassifiable genes

_A_u_t_h_o_r(_s):

     VJ Carey

_E_x_a_m_p_l_e_s:

     require(Biobase)
     data(sample.exprSet.1)
     print(summary(eddObsolete(sample.exprSet.1,k=10,l=2)))

     # 6 x 20 x 50 test problem
     set.seed(1234)
     test <- matrix(NA,nr=120,nc=50)
     test[1:20,] <- rnorm(1000)
     test[21:40,] <- rt(1000,3)
     test[41:60,] <- rexp(1000,4)
     test[61:80,] <- rmixnorm(1000,.750,0,1,4,1)
     test[81:100,] <- runif(1000)
     test[101:120,] <- rlnorm(1000)
     labs <- c(rep("n01",20),rep("t3",20),
     rep("exp",20),rep("mix1",20),rep("u01",20),rep("ln01",20))

     phenoData = new("phenoData", pData = data.frame(1:50), varLabels = list("Col1"))
     TT <- new("exprSet", exprs=test, phenoData = phenoData)

     multrun <- eddObsolete( TT, k=10, l=2 )
     print(table(given=labs, multiCand=multrun))
     netrun <- eddObsolete( TT, ref="nnet" )
     print(table(given=labs, netout=netrun))
     newrun <- edd( TT, meth="nnet", size=10, decay=.2 )
     print( table( given=labs, newout=newrun ) )
     newrun <- edd( TT, meth="test" )
     print( table( given=labs, newout=newrun ) )

