runSigPathway           package:sigPathway           R Documentation

_P_e_r_f_o_r_m _p_a_t_h_w_a_y _a_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs pathway analysis

_U_s_a_g_e:

     runSigPathway(G, minNPS = 20, maxNPS = 500,
                   tab, phenotype, nsim = 1000,
                   weightType = c("constant", "variable"), ngroups = 2,
                   npath = 25, verbose = FALSE, allpathways = FALSE,
                   annotpkg = NULL, alwaysUseRandomPerm = FALSE)

_A_r_g_u_m_e_n_t_s:

       G: a list containing the source, title, and probe sets
          associated with each curated pathway

  minNPS: an integer specifying the minimum number of probe sets in
          'tab' that should be in a gene set

  maxNPS: an integer specifying the maximum number of probe sets in
          'tab' that should be in a gene set

     tab: a numeric matrix of expression values, with the rows and
          columns representing probe sets and sample arrays,
          respectively

phenotype: a numeric (or character if 'ngroups' >= 2) vector indicating
          the phenotype

    nsim: an integer indicating the number of permutations to use

weightType: a character string specifying the type of weight to use
          when calculating NEk statistics

 ngroups: an integer indicating the number of groups in the matrix

   npath: an integer indicating the number of top gene sets to consider
          from each statistic when ranking the top pathways

 verbose: a boolean to indicate whether to print debugging messages to
          the R console

allpathways: a boolean to indicate whether to include the top npath
          pathways from each statistic or just consider the top npath
          pathways (sorted by the sum of ranks of both statistics) when
          generating the summary table

annotpkg: a character vector specifying the name of the BioConductor
          annotation package to use to fetch accession numbers, Entrez
          Gene IDs, gene name, and gene symbols

alwaysUseRandomPerm: a boolean to indicate whether the algorithm can
          use complete permutations for cases where 'nsim' is greater
          than the total number of unique permutations possible with
          the 'phenotype' vector

_D_e_t_a_i_l_s:

     'runSigPathway' is a wrapper function that

     (1) Selects the gene sets to analyze using 'selectGeneSets'

     (2) Calculates NTk and NEk statistics using 'calculate.NTk' and
     'calculate.NEK'

     (3) Ranks the top 'npath' pathways from each statistic using
     'rankPathways'

     (4) Summarizes the means, standard deviation, and individual
     statistics of each probe set in each of the above pathways using
     'getPathwayStatistics'

_V_a_l_u_e:

     A list containing 

  gsList: a list containing three vectors from the output of the
          'selectGeneSets' function

list.NTk: a list from the output of calculate.NTk

list.NEk: a list from the output of calculate.NEk

df.pathways: a data frame from 'rankPathways' which contains the top
          pathways' indices in 'G', gene set category, pathway title,
          set size, NTk statistics, NEk statistics, the corresponding
          q-values, and the ranks.  

list.gPS: a list from 'getPathwayStatistics' containing
          'nrow(df.pathways)' data frames corresponding to the pathways
          listed in 'df.pathways'.  Each data frame contains the name,
          mean, standard deviation, the test statistic (e.g., t-test),
          and the corresponding unadjusted p-value.  If 'ngroups' = 1,
          the Pearson correlation coefficient is also returned.  If a
          valid 'annotpkg' is specified, the probes' accession numbers,
          Entrez Gene IDs, gene name, and gene symbols are also
          returned.

parameters: a list of parameters (e.g., 'nsim') used in the analysis

_A_u_t_h_o_r(_s):

     Lu Tian, Peter Park, and Weil Lai

_R_e_f_e_r_e_n_c_e_s:

     Tian L., Greenberg S.A., Kong S.W., Altschuler J., Kohane I.S.,
     Park P.J. (2005)  Discovering statistically significant pathways
     in expression profiling studies.  _Proceedings of the National
     Academy of Sciences of the USA_, *102*, 13544-9.

     <URL: http://www.pnas.org/cgi/doi/10.1073/pnas.0506577102>

_E_x_a_m_p_l_e_s:

     ## Load in filtered, expression data
     data(MuscleExample)

     ## Prepare the pathways to analyze and run analysis with 1 wrapper function

     nsim <- 1000
     ngroups <- 2
     verbose <- TRUE
     weightType <- "constant"
     npath <- 25
     allpathways <- FALSE
     annotpkg <- "hgu133a.db"

     res.muscle <- runSigPathway(G, 20, 500, tab, phenotype, nsim,
                                 weightType, ngroups, npath, verbose,
                                 allpathways, annotpkg)

     ## Summarize results
     print(res.muscle$df.pathways)

     ## Get more information about the probe sets' means and other statistics
     ## for the top pathway in res.pathways
     print(res.muscle$list.gPS[[1]])

     ## Write table of top-ranked pathways and their associated probe sets to
     ## HTML files
     writeSigPathway(res.muscle, tempdir(), "sigPathway_rSP",
                     "TopPathwaysTable.html")

