fdr1d                 package:OCplus                 R Documentation

_C_o_m_p_u_t_e _c_l_a_s_s_i_c_a_l _l_o_c_a_l _f_a_l_s_e _d_i_s_c_o_v_e_r_y _r_a_t_e

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the classical local false discovery rate for multiple 
     parallel t-statistics.

_U_s_a_g_e:

     fdr1d(xdat, grp, test, p0, nperm = 100, nr = 50, seed = NULL, null = NULL, 
           zlim = 1, sv2 = 0.01, err = 1e-04, verb = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

    xdat: the matrix of expression values, with genes as rows and
          samples as columns

     grp: a grouping variable giving the class membership of each
          sample, i.e. each column in 'xdat'

    test: a function that takes 'xdat' and 'grp' as the first two
          arguments and returns the test statistic; by default,
          two-sample t-statistics are calculated.

      p0: if supplied, an estimate for the proportion of
          non-differentially expressed genes; if not supplied, the
          routine will estimate it, see Details.

   nperm: number of permutations for establishing the null distribution
          of the t-statistic

      nr: the number of equidistant breaks into which the range of test
          statistics is broken for calculating the fdr.

    seed: if specified, the random seed from which the permuations are
          started

    null: optional argument for passing in a pre-calculated null
          distribution, see Details.

    zlim: if no 'p0' is specified, the ratio of densities in the range
          of test statistics between '-zlim' and 'zlim' will be used to
          estimate the proportion of non-differentially expressed
          genes; ignored if 'p0' is specified.

     sv2: positive number controlling the initial degree of smoothing
          for the densities involved, with smaller values indicating
          more smoothing; see Details.

     err: positive number controlling the convergence of the smoothing
          procedure, with smaller values implying more iterations; see
          Details. 

    verb: logical value indicating whether provide extra information.

     ...: extra arguments to function 'test'.

_D_e_t_a_i_l_s:

     This function calculates the local false discovery rate (fdr, as
     opposed to global FDR) introduced by Efron et al., 2001. The
     underlying model assumes that for a given grouping of samples, we
     study a mixture of differentially expressed (DE) and non-DE genes,
     and that consequently, the observed distribution of test
     statistics is a mixture of test statistics under the alternative
     and the null statistic, respectively. The densities involved are
     estimated nonparametrically and smoothed, using a permutation
     argument for the null distribution.

     By default, the null distribution is generated and stored only
     within the function, and is not available outside. If someone
     wants to study the null distribution, or wants to re-use the same
     null distribution efficiently while e.g. varying the smoothing
     parameter, the argument 'null' allows the use of an externally
     generated null distribution, created e.g. using the 'PermNull'
     function.

     Theoretically, the function should support any kind of function
     along the lines of 'tstatistics', however, this has not been
     tested, and the current setup is very much geared towards t-tests.

     We use non-Gaussian mixed model smoothing for Possion counts for
     smoothing the density estimates, see Pawitan, 2001, and
     'smooth1d'.

_V_a_l_u_e:

     Basically, a data frame with one row per gene and two columns:
     'tstat', the test statistic, and 'fdr.local', the local false
     discovery rate. This data frame has the additional class
     attributes 'fdr1d.result' and 'fdr.result', see Examples. This is
     the bad old S3 class mechanism employed to provide plot and
     summary functions. 

     Additional information is provided by a 'param' attribute, which
     is a list with the following entries: 

      p0: the proportion of non-differentially expressed genes used
          when calculating the fdr.

  p0.est: a logical value indicating whether 'p0' was estimated from
          the data or supplied by the user.

     fdr: the smoothed fdr values calculated for the original
          intervals.

 xbreaks: vector of breaks for the test statistic defining the interval
          for calculation.

_A_u_t_h_o_r(_s):

     A. Ploner

_R_e_f_e_r_e_n_c_e_s:

     Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes
     Analysis of a Microarray Experiment. _JASA_, 96(456), p. 1151-60.

     Pawitan Y.(2001) _In All Likelihood_, Oxford University Press, ch.
     18.11

_S_e_e _A_l_s_o:

     'plot.fdr1d.result', 'summary.fdr.result', 'OCshow',
     'tstatistics', 'smooth1d', 'fdr2d', 'PermNull'

_E_x_a_m_p_l_e_s:

     # We simulate a small example with 5 percent regulated genes and
     # a rather large effect size
     set.seed(2000)
     xdat = matrix(rnorm(50000), nrow=1000)
     xdat[1:25, 1:25] = xdat[1:25, 1:25] - 1
     xdat[26:50, 1:25] = xdat[26:50, 1:25] + 1
     grp = rep(c("Sample A","Sample B"), c(25,25))

     # A default run
     res1d = fdr1d(xdat, grp)
     res1d[1:20,]

     # Looking at the results
     summary(res1d)
     plot(res1d)
     res1d[res1d$fdr<0.05, ]

     # Averaging estimates the global FDR for a set of genes
     ndx = abs(res1d$tstat) > 3
     mean(res1d$fdr[ndx])

     # Extra information
     class(res1d)
     attr(res1d,"param")

