ebam                package:siggenes                R Documentation

_E_m_p_i_r_i_c_a_l _B_a_y_e_s _A_n_a_l_y_s_i_s _o_f _M_i_c_r_o_a_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is
     possible to perform one and two class analyses using either a
     modified t-statistic or a (standardized)  Wilcoxon rank statistic,
     and a multiclass analysis using a modified F-statistic.  Moreover,
     this function provides a EBAM procedure for categorical data such
     as SNP data and the possibility to employ an user-written score
     function.

_U_s_a_g_e:

       ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, 
            p0 = NA, p0.estimation = c("splines", "interval", "adhoc"), 
            lambda = NULL, ncs.value = "max", use.weights = FALSE, 
            gene.names = dimnames(x)[[1]], ...)

_A_r_g_u_m_e_n_t_s:

       x: either a matrix, a data frame or an ExpressionSet object, or
          the output of 'find.a0',  i.e. an object of class 'FindA0'.
          Can also be a list (if 'method = chisq.ebam' or  'method =
          trend.ebam'). For the latter case, see 'chisq.ebam'.  If 'x'
          is not a FindA0 object, then each row of 'x' (or 'exprs(x)',
          respectively) must correspond to a variable (e.g., a gene or
          a SNP), and each column to a sample.

      cl: a specification of the class labels of the samples. Ignored
          if 'x' is a 'FindA0' object. Needs not to be specified if 'x'
          is a list.

          Typically, 'cl' is specified by a vector of length 'ncol(x)'.
            In the two class paired case, 'cl' can also  be a matrix
          with 'ncol(x)' rows and 2 columns. If 'x' is an ExpressionSet
          object, 'cl' can also be a character string naming the column
           of 'pData(x)' that contains the class labels of the samples.

          In the one-class case, 'cl' should be a vector of 1's. 

          In the two class unpaired case, 'cl' should be a vector
          containing 0's (specifying the samples of, e.g., the control
          group) and 1's (specifying, e.g., the case group). 

          In the two class paired case, 'cl' can be either a numeric
          vector or a numeric matrix.  If it is a vector, then 'cl' has
          to consist of the integers between -1 and  -n/2 (e.g., before
          treatment group) and between 1 and n/2 (e.g., after treatment
          group), where n is the length of 'cl' and k is paired with
          -k, k=1,...,n/2. If 'cl' is a matrix, one column should
          contain -1's and 1's specifying, e.g., the before and the
          after treatment samples, respectively, and the other column
          should contain integer between 1 and n/2 specifying the n/2
          pairs of observations.

          In the multiclass case and if 'method = chisq.ebam' or
          'method = trend.ebam',  'cl' should be a vector containing
          integers between 1 and g, where g is  the number of groups.
          In the two latter cases, 'cl' needs not to be specified, if
          'x' is a list. For details, see 'chisq.ebam'.

          For examples of how 'cl' can be specified, see the manual of
          'siggenes'.

  method: a character string or name specifying the method or function
          that should be used in the computation of the expression
          score z. 

          If 'method = z.ebam', a modified t- or F-statistic,
          respectively, will be computed as proposed by Efron et al.
          (2001).

          If 'method = wilc.ebam', a (standardized) Wilcoxon sum /
          signed rank statistic will be used as expression score. 

          For an analysis of categorical data such as SNP data,
          'method' can be set to 'chisq.ebam'. In this case, Pearson's
          Chi-squared statistic is computed for each row. 

          If the variables are ordinal and a trend test should be
          applied  (e.g., in the two-class case, the Cochran-Armitage
          trend test), 'method = trend.ebam' can be employed.

          It is also possible to employ an user-written function for
          computing an user-specified expression score. For details,
          see the vignette of 'siggenes'.

   delta: a numeric vector consisting of probabilities for which the
          number of differentially expressed genes and the FDR should
          be computed, where a gene is called differentially expressed
          if its posterior probability is larger than Delta.

which.a0: an integer between 1 and the length of 'quan.a0' of
          'find.a0'. If 'NULL', the suggested choice of 'find.a0' is
          used. Ignored if 'x' is a matrix, data frame or ExpressionSet
          object.

      p0: a numeric value specifying the prior probability p0 that a
          gene is not differentially expressed. If 'NA', 'p0' will be
          estimated automatically.

p0.estimation: either '"splines"' (default), '"interval"', or
          '"adhoc"'.  If '"splines"', the spline based method of Storey
          and Tibshirani (2003) is used to estimate p0. If '"adhoc"'
          ('"interval"'), the adhoc (interval based) method  proposed
          by Efron et al. (2001) is used to estimate p0.

  lambda: a numeric vector or value specifying the lambda values used
          in the estimation of p0. If 'NULL', 'lambda' is set to
          'seq(0, 0.95, 0.05)' if 'p0.estimation = "splines"', and to
          '0.5' if 'p0.estimation = "interval"'. Ignored if
          'p0.estimation = "adhoc"'. For details, see 'pi0.est'.

ncs.value: a character string. Only used if 'p0.estimation = "splines"'
          and 'lambda' is a vector. Either '"max"' or '"paper"'. For
          details, see 'pi0.est'.

use.weights: should weights be used in the spline based estimation of
          p0? If 'TRUE', 1 - 'lambda' is used as weights. For details,
          see 'pi0.est'.

gene.names: a vector of length 'nrow(x)' specifying the names of the
          variables. By default, the row names of the matrix / data
          frame comprised by 'x' are used.

     ...: further arguments of the specific EBAM methods. If 'method =
          z.ebam', see  'z.ebam'. If 'method = wilc.ebam', see
          'wilc.ebam'. If 'method = chisq.ebam', see 'chisq.ebam'.

_V_a_l_u_e:

     An object of class EBAM.

_A_u_t_h_o_r(_s):

     Holger Schwender, holger.schw@gmx.de

_R_e_f_e_r_e_n_c_e_s:

     Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001).
     Empirical Bayes Analysis of a Microarray Experiment. _JASA_, 96,
     1151-1160.

     Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying
     Interesting Genes with siggenes. _RNews_, 6(5), 45-50.

     Storey, J.D. and Tibshirani, R. (2003). Statistical Significance
     for Genome-Wide Studies. _Proceedings of the National Academy of
     Sciences_, 100, 9440-9445.

_S_e_e _A_l_s_o:

     'EBAM-class', 'find.a0', 'z.ebam', 'wilc.ebam', 'chisq.ebam'

_E_x_a_m_p_l_e_s:

     ## Not run: 
       # Load the data of Golub et al. (1999) contained in the package multtest.
       data(golub)
       
       # golub.cl contains the class labels.
       golub.cl
       
       # Perform an EBAM analysis for the two class unpaired case assuming
       # unequal variances. Specify the fudge factor a0 by the suggested
       # choice of find.a0
       find.out <- find.a0(golub, golub.cl, rand = 123)
       ebam.out <- ebam(find.out)
       ebam.out
         
       # Since a0 = 0 leads to the largest number of genes (i.e. the suggested
       # choice of a0), the following leads to the same results as the above
       # analysis (but only if the random number generator, i.e. rand, is set
       # to the same number).
       ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
       ebam.out2

       # If fast is set to TRUE in ebam, a crude estimate of the number of
       # falsely called genes is used (see the help file for z.ebam). This
       # estimate is always employed in find.a0. 
       # The exact number is used in ebam when performing
       ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
       ebam.out3  

       # Since this is the recommended way, we use ebam.out3 at the end of
       # the Examples section for further analyses.


       # Perform an EBAM analysis for the two class unpaired case assuming
       # equal group variances. Set a0 = 0, and use B = 50 permutations
       # of the class labels.
       ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
          rand = 123)
       ebam.out4
         
       # Perform an EBAM analysis for the two class unpaired cased assuming
       # unequal group variances. Use the median (i.e. the 50% quantile)
       # of the standard deviations of the genes as fudge factor a0. And
       # obtain the number of genes and the FDR if a gene is called 
       # differentially when its posterior probability is larger than
       # 0.95.
       ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
          rand = 123)
       ebam.out5
         
       # For the third analysis, obtain the number of differentially
       # expressed genes and the FDR if a gene is called differentially
       # expressed if its posterior probability is larger than 0.8, 0.85,
       # 0.9, 0.95.
       print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
         
       # Generate a plot of the posterior probabilities for delta = 0.9.
       plot(ebam.out3, 0.9)
         
       # Obtain the list of genes called differentially expressed if their
       # posterior probability is larger than 0.99, and gene-specific 
       # statistics for these variables such as their z-value and their
       # local FDR.
       summary(ebam.out3, 0.99)
     ## End(Not run)

