shortestPath             package:GOstats             R Documentation

_S_h_o_r_t_e_s_t _P_a_t_h _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     The shortest path analysis was proposed by Zhou et. al. The basic
     computation is to find the shortest path in a supplied graph
     between two Entrez Gene IDs. Zhou et al claim that other genes
     annotated along that path are likely to have the same GO
     annotation as the two end points.

_U_s_a_g_e:

     shortestPath(g, GOnode, mapfun=NULL, chip=NULL)

_A_r_g_u_m_e_n_t_s:

       g: An instance of the 'graph' class. 

  GOnode: A length one character vector specifying the GO node of
          interest. 

  mapfun: A function taking a character vector of GO IDs as its only
          argument and returning a list of character vectors of Enterz
          Gene IDs annotated at each corresponding GO ID.  The function
          should behave similarly to 'mget(x, go2egmap,
          ifnotfound=NA)', that is, 'NA' should be returned if a
          specified GO ID has no Entrez ID mappings.  See details for
          the interaction of 'mapfun' and 'chip'.

    chip: The name of a DB-based annotation data package (the name will
          end in ".db").  This package will be used to generate an
          Entrez ID to GO ID mapping instead of 'mapfun'.

_D_e_t_a_i_l_s:

     The algorithm implemented here is quite simple. All Entrez Gene 
     identifiers that are annotated at the GO node of interest are
     obtained. Those that are found as nodes in the graph are retained
     and used for the computation. For every pair of nodes at the GO
     term the shortest path between them is computed using 'sp.between'
     from the RBGL package.

     There is a presumption that the graph is 'undirected'. This
     restriction could probably be lifted if there was some reason for
     it - a patch would be gratefully accepted.

     The mapping of GO node to Entrez ID is achieved in one of three
     ways:

        1.  If 'mapfun' is provided, it will be used to perform the
           needed lookups.  In this case, 'chip' will be ignored.

        2.  If 'chip' is provided and 'mapfun=NULL', then the needed
           lookups will be done based on the GO to Entrez mappings
           encapsulated in the specified annotation data package.  This
           is the recommended usage.

        3.  If 'mapfun' and 'chip' are 'NULL' or missing, then the
           function will attempt to load the GO package (the
           environment-based package, distinct from GO.db).  This
           package contains a legacy environment mapping GO IDs to
           Entrez IDs.  If the GO package is not available, an error
           will be raised. Omitting both 'mapfun' and 'chip' is not
           recommended as it is not compatible with the DB-based
           annotation data packages.

_V_a_l_u_e:

     The return values is a list with the following components: 

shortestpaths : A list of the ouput from 'sp.between'. The names are
          the names of the nodes used as the two endpoints

nodesUsed : A vector of the Entrez Gene IDs that were both found at the
          GO term of interest and were nodes in the supplied graph,
          'g'. These were used to compute the shortest paths.

nodesNotUsed: A vector of Entrez Gene IDs that were annotated at the GO
          term, but were not found in the graph 'g'.

_A_u_t_h_o_r(_s):

     R. Gentleman

_R_e_f_e_r_e_n_c_e_s:

     Transitive functional annotation by shortest-path analysis of gene
     expression data, by X. Zhou and M-C J. Kao and W. H. Wong, PNAS,
     2002

_S_e_e _A_l_s_o:

     'sp.between'

_E_x_a_m_p_l_e_s:

     library("hgu95av2.db")
     library("RBGL")

     set.seed(321)
     uniqun <- function(x) unique(unlist(x))

     goid <- "GO:0005778"
     egIds <- uniqun(mget(uniqun(hgu95av2GO2PROBE[[goid]]),
                                 hgu95av2ENTREZID))

     v1 <- randomGraph(egIds, 1:10, .3, weights=FALSE)
     ## Since v1 is random, it might be disconnected and we need a
     ## connected graph to guarantee the existence of a path.
     c1 <- connComp(v1)
     largestComp <- c1[[which.max(sapply(c1, length))]]
     v2 <- subGraph(largestComp, v1)

     a1 <- shortestPath(v2, goid, chip="hgu95av2.db")

