TOC                  package:OCplus                  R Documentation

_T_h_e_o_r_e_t_i_c_a_l _F_D_R _a_n_d _s_e_n_s_i_t_i_v_i_t_y _a_s _a _f_u_n_c_t_i_o_n _o_f _c_u_t_o_f_f _l_e_v_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Computes and plots the operating characteristics for a two group
     microarray experiment based on a theoretical model. The false
     discovery rate (FDR) is plotted against the cutoff level on the
     t-statistic. Optionally, curves for the the classical significance
     level and sensitivity can be added. Different curves for different
     proportions of non-differentially expressed genes can be compared
     in the same plot, and the sample size per group can be varied
     between plots.

_U_s_a_g_e:

     TOC(n = 10, p0 = 0.95, sigma = 1, D, F0, F1, n1 = n, n2 = n, paired = FALSE,
         plot = TRUE, local.show=FALSE, alpha.show = TRUE, sensitivity.show = TRUE,
             nplot = 100, xlim, ylim = c(0, 1), main, legend.show = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

n, n1, n2: number of samples per group, by default equal and specified
          via 'n', but can be set to different values via 'n1' and
          'n2'.

      p0: the proportion of not differentially expressed genes, may be
          vector valued

   sigma: the standard deviation for the log expression values

       D: assumed average log fold change (in units of 'sigma'), by
          default 1; this is a shortcut for specifying a simple
          symmetrical alternative hypothesis through 'F1'.

      F0: the distribution of the log2 expression values under the null
          hypothesis; by default, this is normal with mean zero and
          standard deviation 'sigma',  but mixtures of normals can be
          specified, see Details and Examples.

      F1: the distribution of the log2 expression values under the
          alternative hypothesis; by default, this is an equal mixture
          of two normals with means  'D' and -'D' and standard
          deviation 'sigma'; mixture of normals are again possible, see
          Details and Examples.

  paired: logical value indicating whether two distinct groups of
          observations or one group of paired observations are studied.

    plot: logical value indicating whether the results should be
          plotted.

local.show: logical value indicating whether to show local or global
          false discovery rate (default: global).

alpha.show: logical value indicating whether to show the classical
          significance level for testing one hypothesis as a function
          of the cutoff level.

sensitivity.show: logical value indicating whether to show the
          classical sensitivity for testing one hypothesis as a
          function of the cutoff level.

   nplot: number of points that are evaluated for the curves

    xlim: the usual limits on the horizontal axis

    ylim: the usual limits on the vertical axis

    main: the main title of the plot

legend.show: logical value indicating whether to show a legend for the
          different types of curves in the plot.

     ...: the usual graphical parameters, passed to 'plot'

_D_e_t_a_i_l_s:

     This function plots the FDR as a function of the cutoff level when
     comparing the expression of multiple genes between two groups of
     subjects. We study a gene selection mechanism that declares all
     genes to be differentially expressed whose t-statistics have an
     absolute value greater than a specified cutoff value. The
     comparison is based on a two-sample t-statistic for equal
     variances, for either paired or unpaired observations. 

     The underlying model assumes that a proportion 'p0' of genes are
     not differentially expressed between groups, and that 1-'p0' are.
     The logarithmized gene expression values are assumed to be
     generated by mixtures of normal distributions. Both null and
     alternative hypothesis are specified through the means of the
     respective mixture components; these means can be interpreted as
     average log2 fold changes in units of the standard deviation
     'sigma'.

     Note that the model does _not_ assume that all genes have the same
     standard deviation 'sigma', only that the mean log2 fold change
     for all regulated genes is proportional to their individual
     variability (standard deviation). 'sigma' generally does not need
     to be specified explicitly and can be left at its default value of
     one, so that 'D' can be interpreted straightforward as log2 fold
     change between groups.

     The default null distribution of the log2 expression values is a
     single normal distribution with mean zero (and standard deviation
     'sigma'); the default alternative distribution is is an equal
     mixture of two normals with means   'D' and -'D' (and again
     standard deviation 'sigma'). However, general mixtures of normals
     can be specified for both null and alternative distribution
     through 'F0' and 'F1', respectively: both are lists with two
     elements:


        *  'D' is the vector of means (i.e. log2 fold changes),

        *  'p' is the vector of mixing proportions for the means.

     If present, 'p' must be the same length as 'D'; its elements do
     not  need to be normalized, i.e. sum to one; if absent, equal
     mixing is assumed, see Examples. A wide (mixture) null hypothesis,
     or an empirical null hypothesis as outlined by Efron (2004), can
     be used if genes with log fold changes close to zero are thought
     to be of no biological interest, and  are counted as effectively
     not regulated. Similarly, the alternative hypothesis can be any
     mixture of large and small effects, symmetric or non-symmetric,
     depending on the expected regulation patterns, see Examples.

     As a consequence, both the null distribution of the t-statistics
     (for the unregulated genes) and their alternative distribution
     (for the regulated genes) are mixtures of (generally non-central)
     t-distributions, see 'FDR'.

     Sample size 'n' and standard deviation 'sigma' are atomic values,
     but multiple 'p0' can be specified, resulting in multiple curves.
     Additionally, the usual significance level and sensitivity for a
     classical one-hypothesis can be displayed.

_V_a_l_u_e:

     This function returns invisibly a data frame with 'nplot' rows
     whose columns contain the information for the individual curves.
     The number of columns and their names will depend on the number
     and value of the 'p0' specified, and whether alpha and sensitivity
     are displayed. Additionally, the returned data frame has an
     attribute 'param', which is a list with all the non-plotting
     arguments to the function.

_N_o_t_e:

     Both the curve labels and the legend may be squashed if the
     plotting device is too small. Increasing the size of the device
     and re-plotting should improve readability.

_A_u_t_h_o_r(_s):

     Y. Pawitan and A. Ploner

_R_e_f_e_r_e_n_c_e_s:

     Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. (2005)
     False Discovery Rate, Sensitivity and Sample Size for Microarray
     Studies. _Bioinformatics_, 21, 3017-3024.

     Efron, B. (2004) Large-Scale Simultaneous Hypothesis Testing: The
     Choice of a Null Hypothesis. _JASA_, 99, 96-104.

_S_e_e _A_l_s_o:

     'FDR', 'samplesize', 'EOC'

_E_x_a_m_p_l_e_s:

     # Default null and alternative distributions, assuming different proportions
     # of regulated genes
     TOC(p0=c(0.90, 0.95, 0.99), legend.show=TRUE)

     # The effect of sample size and effect size
     par(mfrow=c(2,2))
     TOC(p0=c(0.90, 0.95, 0.99), n=5, D=1)
     TOC(p0=c(0.90, 0.95, 0.99), n=30, D=1)
     TOC(p0=c(0.90, 0.95, 0.99), n=5, D=2)
     TOC(p0=c(0.90, 0.95, 0.99), n=30, D=2)

     # A wide null distribution that allows to disregard genes of small effect
     # unspecified p means equal mixing proportions
     ret = TOC(F0=list(D=c(-0.25,0,0.25)), main="Wide F0") 
     attr(ret,"param")$F0 # the null hypothesis

     # An extended (and unsymmetric) alternative
     ret = TOC(F1=list(D=c(-2,-1,1), p=c(1,2,2)), p0=0.95, main="Unsymmetric F1")
     attr(ret,"param")$F1 # F1$p is normalized

     # Unequal sample sizes
     TOC(n1=10, n2=30)

     # Curves for a paired t-test
     TOC(paired=TRUE)

     # The output contains all the x- and y-coordinates
     ret = TOC(p0=c(0.90, 0.95, 0.99), main="Default settings")
     dim(ret)
     colnames(ret)
     ret[1:10,]
     # Additionally, the list of arguments that determine the experiment
     attr(ret,"param")

