retrain             package:gene2pathway             R Documentation

_R_e_t_r_a_i_n _c_l_a_s_s_i_f_i_c_a_t_i_o_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Retrains the hierarchical classification model. This way new
     information from InterPro and KEGG databases can be incorporated
     to give better predictions. Retraining should be done on a regular
     basis from time to time.

_U_s_a_g_e:

     retrain(minnmap=30, level1Only="Metabolism", level2Only="Genetic Information Processing", organism="hsa", gene2Domains=NULL, KEGG.package=FALSE, remove.duplicates=FALSE, use.bagging=TRUE, nbag=11)

_A_r_g_u_m_e_n_t_s:

 minnmap: prune hierarchy branches with < minnmap mapping genes

level1Only: for these hierarchy branches only the first level is used 

level2Only: for these hierarchy branches only the first and the second
          levels are used 

organism: KEGG letter code describing an organism.  Please refer to
          <URL:http://www.genome.jp/kegg-bin/create_kegg_menu> for a
          complete list of organisms (and their letter codes) supported
          by KEGG.

gene2Domains: By default associations between genes and InterPro
          domains are retrieved via biomaRt from Ensembl.
          Alternatively, the user can provide its own mapping of genes
          to InterPro domains in form of a list here (see details).

KEGG.package: Instead of retrieving information directly from KEGG, one
          can use the KEGG.db package instead, which is significantly
          faster. However, the KEGG.db package only supports a fraction
          of organisms so far. Please refer to the manual pages of the
          KEGG.db package for further information. Default: Don't use
          KEGG.db package

remove.duplicates: remove genes having the same InterPro domains prior
          training. Default: Don't do this 

use.bagging: use bagging 

    nbag: number of models to average over

_D_e_t_a_i_l_s:

     A hierarchical classification model based on SVMs and a ranking
     perceptron algorithm is trained. This model is usually
     additionally bagged to improve prediction qualitiy. The method
     produces a "classificationModel_[organism].rda" (e.g.
     "classificationModel_hsa.rda") file, which should be stored in the
     package data directory. Once a new model has been trained, the
     complete package should be reloaded.

     The current version of the KEGG hierarchy is always retrieved
     directly from KEGG via FTP. By default associations between genes
     and InterPro domains are retrieved automatically via biomaRt from
     Ensembl. Please refer to <URL:http://www.ebi.ac.uk/ensembl/> for a
     list of organisms supported by Ensembl. Alternatively to using
     Ensembl and biomaRt, the user can provide its own mapping of genes
     to InterPro domains in form of a list. This especially allows for
     using organisms, which are supported by KEGG, but not by Ensembl
     so far. The list has the form genes -> InterPro domains, and each
     list entry is named by the Entrez gene ID of the corresponding
     gene. This is, because KEGG uses Entrez gene IDs for the mapping
     genes -> KEGG pathways.

_V_a_l_u_e:

     The model structure. See 'classificationModel' for details.

_A_u_t_h_o_r(_s):

     Holger Froehlich

_S_e_e _A_l_s_o:

     'gene2pathway', 'classificationModel'

_E_x_a_m_p_l_e_s:

     ## Not run: 
             retrain(KEGG.package=TRUE, organism="dme") # retrain classification model for drosophila using information from the KEGG.db package
     ## End(Not run)

