iterateBMAglm.train.predict.testpackage:iterativeBMAR Documentation

_I_t_e_r_a_t_i_v_e _B_a_y_e_s_i_a_n _M_o_d_e_l _A_v_e_r_a_g_i_n_g: _t_r_a_i_n_i_n_g, _p_r_e_d_i_c_t_i_o_n _a_n_d _t_e_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Classification and variable selection on microarray data. This is
     a multivariate technique to select a small number of relevant
     variables (typically genes) to classify microarray samples.  This
     function performs the training, prediction and testing steps.  The
     data is assumed to consist of two classes, and the classes of the
     test data is assumed to be known. Logistic regression is used for
     classification.

_U_s_a_g_e:

     iterateBMAglm.train.predict.test (train.expr.set, test.expr.set, train.class, test.class, p=100, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1)

_A_r_g_u_m_e_n_t_s:

train.expr.set: an 'ExpressionSet' object. We assume the rows in the
          expression data represent variables (genes),  while the
          columns  represent  samples or experiments. This training
          data is used to select relevant genes (variables) for
          classification.

test.expr.set: an 'ExpressionSet' object. We assume the rows in the
          expression data represent variables (genes),  while the
          columns  represent samples or experiments. The variables
          selected using the training data is used to classify samples
          on this test data.

train.class: class vector for the observations (samples or 
          experiments) in the training data.  Class numbers are assumed
          to start from 0, and the length of this class vector should
          be equal to the number of rows in train.dat. Since we assume
          2-class data, we expect the class vector consists of zero's
          and one's.

test.class: class vector for the observations (samples or  experiments)
          in the test data.  Class numbers are assumed to start from 0,
          and the length of this class vector should be equal to the
          number of rows in test.dat. Since we assume 2-class data, we
          expect the class vector consists of zero's and one's.

       p: a number indicating the maximum number of top univariate
          genes used in the iterative BMA algorithm.  This number is
          assumed to be less than the total number of genes in the
          training data. A larger p usually requires longer
          computational time as more iterations of the BMA algorithm
          are potentially applied. The default is 100.

   nbest: a number specifying the number of models of each size 
          returned to 'bic.glm' in the 'BMA' package.  The default is
          10.

 maxNvar: a number indicating the maximum number of variables used in
          each iteration of 'bic.glm' from the 'BMA' package. The
          default is 30.

 maxIter: a number indicating the maximum of iterations of  'bic.glm'.
          The default is 20000.

thresProbne0: a number specifying the threshold for the posterior
          probability that each variable (gene) is non-zero (in
          percent).  Variables (genes) with such posterior  probability
          less than this threshold are dropped in the iterative
          application of 'bic.glm'.  The default is 1 percent.

_D_e_t_a_i_l_s:

     This function consists of the training phase, prediction phase,
     and the testing phase.  The training phase consists of first
     ordering all the variables (genes) by a univariate measure called
     between-groups to within-groups sums-of-squares (BSS/WSS) ratio,
     and then iteratively applying the 'bic.glm' algorithm from the
     'BMA' package.  The prediction phase uses the variables (genes)
     selected in the training phase to classify the samples in the test
     set.  The testing phase assumes that the class labels of the
     samples in the test set are known, and computes the number of 
     classification errors and the Brier Score.

_V_a_l_u_e:

     A list consisting of 4 elements are returned: 

num.genes: The number of relevant genes selected using the training
          data.

num.model: The number of models selected using the training data.

 num.err: The number of classification errors produced when the the
          predicted class labels of the test samples are compared to
          the known class labels.

brierScore: The Brier Score computed using the predicted and known
          class labels of the test samples.  The Brier Score represents
          a probabilistic number of errors. A small Brier Score implies
          high prediction accuracy.

_N_o_t_e:

     The 'BMA' and 'Biobase' packages are required.

_R_e_f_e_r_e_n_c_e_s:

     Raftery, A.E. (1995).  Bayesian model selection in social research
     (with Discussion). Sociological Methodology 1995 (Peter V.
     Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

     Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005)  Bayesian
     Model Averaging: Development of an improved multi-class, gene
     selection and classification tool for microarray data. 
     Bioinformatics 21: 2394-2402.

_S_e_e _A_l_s_o:

     'iterateBMAglm.train',   'iterateBMAglm.train.predict'

_E_x_a_m_p_l_e_s:

     library (Biobase)
     library (BMA)
     library (iterativeBMA)
     data(trainData)
     data(trainClass)
     data (testData)
     data (testClass)

     iterateBMAglm.train.predict.test (train.expr.set=trainData, test.expr.set=testData, trainClass, testClass, p=100)

