spia                  package:SPIA                  R Documentation

_S_i_g_n_a_l_i_n_g _P_a_t_h_w_a_y _I_m_p_a_c_t _A_n_a_l_y_s_i_s (_S_P_I_A) _b_a_s_e_d _o_n _o_v_e_r-_r_e_p_r_e_s_e_n_t_a_t_i_o_n _a_n_d _s_i_g_n_a_l_i_n_g _p_e_r_t_u_r_b_a_t_i_o_n_s _a_c_c_u_m_u_l_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This function implements the SPIA algorithm to analyse KEGG
     signaling pathways.

_U_s_a_g_e:

     spia(de=NULL,all=NULL,organism="hsa",nB=2000,plots=FALSE,verbose=TRUE,beta=NULL)

_A_r_g_u_m_e_n_t_s:

      de: A named vector containing log2 fold-changes of the
          differentially expressed genes. The names of this numeric
          vector are Entrez gene IDs.

     all: A vector with the Entrez IDs in the reference set. If the
          data was obtained from a microarray experiment,  this set
          will contain all genes present on the specific array used for
          the experiment. This vector should contain all names of the
          'de' argument.

organism: A three letter character designating the organism. See a full
          list at ftp://ftp.genome.jp/pub/kegg/xml/organisms .

      nB: Number of bootstrap iterations used to compute the P PERT
          value. Should be larger than 100. A recommended value is
          2000.

   plots: If set to TRUE, the function plots the gene perturbation
          accumulation vs log2 fold  change for every gene on each
          pathway. The null distribution of the total net accumulations
          from which PPERT is computed, is plotted as well. The figures
          are sent to the SPIAPerturbationPlots.pdf file in the current
          directory.

 verbose: If set to TRUE, displays the number of pathways already
          analyzed.

    beta: Weights to be assigned to each type of gene/protein relation
          type. It should be a named numeric vector of length 23, whose
          names must be:
          'c("activation","compound","binding/association","expression","inhibition","activation_phosphorylation","phosphorylation",
          "indirect","inhibition_phosphorylation","dephosphorylation_inhibition","dissociation","dephosphorylation","activation_dephosphorylation",
          "state","activation_indirect","inhibition_ubiquination","ubiquination","expression_indirect","indirect_inhibition","repression",
          "binding/association_phosphorylation","dissociation_phosphorylation","indirect_phosphorylation")'

          If set to null, beta will be by default chosen as:
          c(1,0,0,1,-1,1,0,0,-1,-1,0,0,1,0,1,-1,0,1,-1,-1,0,0,0).  

_D_e_t_a_i_l_s:

     See cited documents for more details.

_V_a_l_u_e:

     A data frame containing the ranked pathways and various
     statistics: 'pSize' is the number of genes on the pathway; 'NDE'
     is the number of DE genes per pathway; 'tA' is the observed total
     preturbation  accumulation in the pathway; 'pNDE' is the
     probability to observe at least 'NDE' genes on the pathway using a
     hypergeometric model; 'pPERT' is the probability to observe a
     total accumulation more extreme than 'tA' only by  chance; 'pG' is
     the p-value obtained by combining 'pNDE' and 'pPERT'; 'pGFdr' and
     'pGFWER' are the False Discovery Rate and respectively Bonferroni
     adjusted global p-values; and the 'Status' gives the direction  in
     which the pathway is perturbed (activated or inhibited).

_A_u_t_h_o_r(_s):

     Adi Laurentiu Tarca <atarca@med.wayne.edu>, Purvesh Khatri, Sorin
     Draghici

_R_e_f_e_r_e_n_c_e_s:

     Adi L. Tarca, Sorin Draghici, Purvesh Khatri, et. al, A Signaling
     Pathway Impact Analysis for  Microarray Experiments, 2008,
     Bioinformatics, 2009, 25(1):75-82. 

     Purvesh Khatri, Sorin Draghici, Adi L. Tarca, Sonia S. Hassan,
     Roberto Romero. A system biology  approach for the steady-state
     analysis of gene signaling networks. Progress in Pattern
     Recognition, Image Analysis and Applications, Lecture Notes in
     Computer Science. 4756:32-41, November 2007. 

     Draghici, S., Khatri, P., Tarca, A.L., Amin, K., Done, A.,
     Voichita, C., Georgescu, C., Romero, R.:  A systems biology
     approach for pathway level analysis. Genome Research, 17, 2007. 

_S_e_e _A_l_s_o:

     'plotP'

_E_x_a_m_p_l_e_s:

     # Example using a colorectal cancer dataset obtained using Affymetrix geneChip technology (GEO GSE4107).
     # Suppose that proper preprocessing was performed and a two group moderated t-test was applied. The topTable 
     # result from limma package for this data set is called "top".
     #The following lines will annotate each probeset to an entrez ID identifier, will keep the most significant probeset for each 
     #gene ID and retain those with FDR<0.05 as differentially expressed.
     #You can run these lines if hgu133plus2.db package is available

     #data(colorectalcancer)
     #x <- hgu133plus2ENTREZID 
     #top$ENTREZ<-unlist(as.list(x[top$ID]))
     #top<-top[!is.na(top$ENTREZ),]
     #top<-top[!duplicated(top$ENTREZ),]
     #tg1<-top[top$adj.P.Val<0.05,]
     #DE_Colorectal=tg1$logFC
     #names(DE_Colorectal)<-as.vector(tg1$ENTREZ)
     #ALL_Colorectal=top$ENTREZ

     data(colorectalcancer)

     # pathway analysis using SPIA; # use nB=2000 or higher for more accurate results
     res<-spia(de=DE_Colorectal, all=ALL_Colorectal, organism="hsa",beta=NULL,nB=200,plots=FALSE, verbose=TRUE)
     res
     # Create the evidence plot
     plotP(res)

