gene2pathway          package:gene2pathway          R Documentation

_P_a_t_h_w_a_y _m_e_m_b_e_r_s_h_i_p _p_r_e_d_i_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Predicts a gene's membership to a branch in the KEGG hierarchy via
     the contained InterPro domains.

_U_s_a_g_e:

     gene2pathway(geneIDs=NULL, flyBase=FALSE, gene2Domains=NULL, organism="hsa", useKEGG=TRUE, KEGG.package=FALSE)

_A_r_g_u_m_e_n_t_s:

 geneIDs: a character vector of Entrez gene IDs or FlyBase identifiers
          (not necessary, if the argument gene2Domains is provided)

 flyBase: Are FlyBase identifiers provided? Default: No

gene2Domains: By default associations between genes and InterPro
          domains are retrieved via biomaRt from Ensembl.
          Alternatively, the user can provide its own mapping of genes
          to InterPro domains in form of a list here (see details).

organism: KEGG letter code describing an organism.  Please refer to
          <URL:http://www.genome.jp/kegg-bin/create_kegg_menu> for a
          complete list of organisms (and their letter codes) supported
          by KEGG.

 useKEGG: Should KEGG information instead of a prediction be used when
          possible?

KEGG.package: If useKEGG=TRUE: Instead of retrieving information
          directly from KEGG, one can use the KEGG.db package instead,
          which is significantly faster. However, the KEGG.db package
          only supports a fraction of organisms so far. Please refer to
          the manual pages of the KEGG.db package for further
          information. Default: use KEGG.db package, if useKEGG=TRUE

_D_e_t_a_i_l_s:

     A hierarchical classification model based on SVMs and a ranking
     perceptron is used. This model is usually additionally bagged to
     improve prediction qualitiy. The model is stored in the package
     data directory and is recommended to be retrained from time to
     time.

     The current version of the KEGG hierarchy is always retrieved
     directly from KEGG via FTP. By default associations between genes
     and InterPro domains are retrieved automatically via biomaRt from
     Ensembl. Please refer to <URL:http://www.ebi.ac.uk/ensembl/> for a
     list of organisms supported by Ensembl. Alternatively to using
     Ensembl and biomaRt, the user can provide its own mapping of genes
     to InterPro domains in form of a list. This especially allows for
     using organisms, which are supported by KEGG, but not by Ensembl
     so far. The list has the form genes -> InterPro domains, and each
     list entry is named by a gene identifier of the corresponding
     gene. If useKEGG=TRUE, Entrez gene IDs or FlyBase identifiers have
     to be used. Otherwise, arbitrary identifiers are allowed.

_V_a_l_u_e:

gene2Path: mapping of gene IDs to corresponding KEGG pathway IDs

gene2Pathname: mapping of gene IDs to corresponding KEGG pathway names

  byKEGG: inticates by TRUE/FALSE for each gene whether the mapping
          information was obtained directly from KEGG or whether it was
          predicted

  scores: confidence scores for the prediction (0, if no prediction was
          performed): see notes for details

_N_o_t_e:

     By default a bagged model prediction is used, i.e. each of the
     individual sub-models is giving a vote for a specific output. The
     final output is determined by the majority of the votes for each
     hierarchy branch separately. The corresponding fraction voting for
     a specific branch may be interpreted as its probability. In the
     ideal case all individual branch probabilites should always be
     close to 1, if the gene maps to that part of the KEGG hierarchy,
     and close to 0 otherwise. A cumulative measure of confindence is
     thus the average over all probabilities > 0.5 and one minus the
     average over all probabilites < 0.5. We combine both measure by
     taking the average of both and report it as a reliability score.

     If the user decides to retrain a model WITHOUT using bagging, then
     the reliability score is simply the margin between the highest and
     the second highest ranked solution. This margin should be larger 2
     for good confindence.

_A_u_t_h_o_r(_s):

     Holger Froehlich

_S_e_e _A_l_s_o:

     'retrain', 'classificationModel'

_E_x_a_m_p_l_e_s:

     ## Not run: 
      gene2pathway("10327") 
     ## End(Not run)

