GOHyperG               package:GOstats               R Documentation

(_D_E_P_R_E_C_A_T_E_D) _H_y_p_e_r_g_e_o_m_e_t_r_i_c _T_e_s_t_s _f_o_r _G_O

_D_e_s_c_r_i_p_t_i_o_n:

     Use 'hyperGTest' instead.

     Given a set of unique Entrez Gene Identifiers, a microarray
     annotation data package name, and the GO category of interest,
     this function will compute Hypergeomtric p-values for
     overrepresentation of each GO term in the specified category among
     the GO annotations for the interesting genes (as indicated by the
     Entrez Gene ids).

_U_s_a_g_e:

     GOHyperG(x, lib, what="MF", universe=NULL)

_A_r_g_u_m_e_n_t_s:

       x: A character vector of unique Entrez Gene identifiers. 

     lib: The name of the annotation data package for the chip that was
          used or '"YEAST"', see details for more information.

    what: One of "MF", "BP", or "CC" indicating which of the GO
          categories to use for the computation.  In 'GOKEGGHyperG',
          what can also be "KEGG"

universe: A character vector of unique Entrez Gene identifiers or
          'NULL'.  This is the population (the urn) of the
          Hypergeometric test.  When 'NULL' (default), the population
          is all Entrez Gene ids in the annotation package that have a
          GO term annotation in the specified GO category (see
          details).

_D_e_t_a_i_l_s:

     The Entrez Gene ids given in 'x' define the selected set of genes.
      The universe of Entrez Gene ids is determined by the chip
     annotation data package ('lib') or specified by the 'universe'
     argument which must be a subset of the Entrez Gene ids represented
     on the chip.  Both the selected genes and the universe are reduced
     by removing Entrez Gene ids that do not have any annotations in
     the specified GO category.

     For each GO term in the specified category that has at least one
     annotation in the selected gene set ('x'), we determine how many
     of its Entrez Gene annotations are in the universe set and how
     many are in the selected set.  With these counts we perform a
     Hypergeometric test using 'phyper'.  This is equivalent to using
     Fisher's exact test.

     It is important that the correct chip annotation data package be
     identified as it determines the GO term to Entrez Gene id mapping
     as well as the universe of Entrez Gene ids in the case that the
     'universe' argument is omitted.

     For S. cerevisiae if the 'lib' argument is set to '"YEAST"' then
     comparisons and statistics are computed using common names and are
     with respect to all genes annotated in the S. cerevisiae genome
     not with respect to any microarray chip.  This will *not* be the
     right thing to do if you are working with a yeast microarray.

_V_a_l_u_e:

     The returned value is a list with components: 

pvalues : The ordered p-values.

goCounts: The vector of counts of Entrez Gene ids from the universe at
          each node.

intCounts: The vector of counts of the supplied Entrez Gene ids
          annotated at each GO term.

   numLL: The number of unique Entrez Gene ids in the universe that are
          mapped to some term in the specified GO category.

  numInt: The number of unique Entrez Gene ids in the selected gene
          set, 'x', that are mapped to some term in the specified GO
          category.

    chip: A string identifying the chip annotation data package used.

  intLLs: The input vector 'x'.

 go2Affy: A list with one element for each GO term tested, containing
          the Affymetrix identifiers associated with that node, for the
          whole chip (not just the interesting genes).  This is the
          same as extracting the tested GO ids from the annotation
          package's GO2ALLPROBES environment.

_N_o_t_e:

     Typically, one has a set of interesting genes/probes obtained from
     a microarray experiment and is interested in determining whether
     there is an overrepresentation of these genes at particular GO
     terms. 'GOHyperG' carries out simple Hypergeometric tests to
     assess the overrepresentation of GO terms.

     Two substantial issues arise.  First, it is not clear how to do
     any form of p-value correction.  The tests are not independent and
     the underlying structure of the GO graph presents certain problems
     that need to be addressed.  The second substantial issue is that
     not all probes on a microarray map to a unique Entrez Gene
     identifer.  In 'GOHyperG' every attempt to appropriately correct
     for non-uniqueness of mappings has been made.

_A_u_t_h_o_r(_s):

     R. Gentleman

_S_e_e _A_l_s_o:

     'hyperGTest' 'geneKeggHyperGeoTest' 'geneCategoryHyperGeoTest'
     'phyper'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     library(hgu95av2)
     library(GO)
     w1<-as.list(hgu95av2ENTREZID)
     w2<-unique(unlist(w1))
     set.seed(123)
     ## pick a 25 interesting genes
     myLL <- sample(w2, 25)
     xx <- GOHyperG(myLL, lib="hgu95av2", what="CC")
     xx$numLL
     xx$numInt
     sum(xx$pvalues < 0.01)
     ## End(Not run)

