MTP                 package:multtest                 R Documentation

_A _f_u_n_c_t_i_o_n _t_o _p_e_r_f_o_r_m _r_e_s_a_m_p_l_i_n_g-_b_a_s_e_d _m_u_l_t_i_p_l_e _h_y_p_o_t_h_e_s_i_s _t_e_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     A user-level function to perform multiple testing procedures
     (MTP). A variety of t- and f-tests, including robust versions of
     each test, are implemented. Single-step and step-down minP and
     maxT methods are used to control the chosen type I error rate
     (FWER, gFWER, TPPFP, or FDR). Bootstrap and permutation null
     distributions are available. Arguments are provided for user
     control of output. Gene selection in microarray experiments is one
     application.

_U_s_a_g_e:

     MTP(X, W = NULL, Y = NULL, Z = NULL, Z.incl = NULL, Z.test = NULL, 
     na.rm = TRUE, test = "t.twosamp.unequalvar", robust = FALSE, 
     standardize = TRUE, alternative = "two.sided", psi0 = 0, typeone = "fwer", 
     k = 0, q = 0.1, fdr.method = "conservative", alpha = 0.05, smooth.null = 
     FALSE, nulldist = "boot", csnull=TRUE,B = 1000, method = "ss.maxT", get.cr = FALSE, 
     get.cutoff = FALSE, get.adjp = TRUE, keep.nulldist = TRUE, seed = NULL, cluster=1,
     type = NULL, dispatch = NULL)

_A_r_g_u_m_e_n_t_s:

       X: A matrix, data.frame or ExpressionSet containing the raw
          data. In the case of an ExpressionSet, 'exprs(X)' is the data
          of interest and 'pData(X)' may contain outcomes and
          covariates of interest. For currently implemented tests, one
          hypothesis is tested for each row of the data.

       W: A vector or matrix containing non-negative weights to be used
          in computing the test statistics. If a matrix, 'W' must be
          the same dimension as 'X' with one weight for each value in
          'X'. If a vector, 'W' may contain one weight for each
          observation (i.e. column) of 'X' or one weight for each
          variable (i.e. row) of 'X'. In either case, the weights are
          duplicated appropraiately. Weighted f-tests are not
          available. Default is 'NULL'.

       Y: A vector, factor, or 'Surv' object containing the outcome of
          interest. This may be class labels (f-tests and two sample
          t-tests) or a continuous or polycotomous dependent variable
          (linear regression based t-tests), or survival data (Cox
          proportional hazards based t-tests). For 'block.f' and
          'f.twoway' tests, class labels must be ordered by block and
          within each block ordered by group. If 'X' is an
          ExpressionSet, 'Y' can be a character string referring to the
          column of 'pData(X)' to use as outcome. Default is 'NULL'.

       Z: A vector, factor, or matrix containing covariate data to be
          used in the regression (linear and Cox) models. Each variable
          should be in one column, so that 'nrow(Z)=ncol(X)'. If 'X' is
          an ExpressionSet, 'Z' can be a character string referring to
          the column of 'pData(X)' to use as covariates. The variables
          'Z.incl' and 'Z.adj' allow one to specify which covariates to
          use in a particular test without modifying the input 'Z'.
          Default is 'NULL'.

  Z.incl: The indices of the columns of 'Z' (i.e. which variables) to
          include in the model. These can be numbers or column names
          (if the columns are names). Default is 'NULL'.

  Z.test: The index or names of the column of 'Z' (i.e. which variable)
          to use to test for association with each row of 'X' in a
          linear model. Only used for 'test="lm.XvsZ"', where it is
          necessary to specify which covariate's regression parameter
          is of interest. Default is 'NULL'.

   na.rm: Logical indicating whether to remove observations with an NA.
          Default is 'TRUE'.

    test: Character string specifying the test statistics to use, by
          default 't.twosamp.unequalvar'. See details (below) for a
          list of tests.

  robust: Logical indicating whether to use the robust version of the
          chosen test, e.g. Wilcoxon singed rank test for robust
          one-sample t-test or 'rlm' instead of 'lm' in linear models.
          Default is 'FALSE'.

standardize: Logical indicating whether to use the standardized version
          of the test statistics (usual t-statistics are standardized).
          Default is 'TRUE'.

alternative: Character string indicating the alternative hypotheses, by
          default 'two.sided'. For one-sided tests, use 'less' or
          'greater' for null hypotheses of 'greater than or equal'
          (i.e. alternative is 'less') and 'less than or equal',
          respectively.

    psi0: The hypothesized null value, typically zero (default).
          Currently, this should be a single value, which is used for
          all hypotheses.

 typeone: Character string indicating which type I error rate to
          control, by default family-wise error rate ('fwer'). Other
          options include generalized family-wise error rate ('gfwer'),
          with parameter 'k' giving the allowed number of false
          positives, and tail probability of the proportion of false
          positives ('tppfp'), with parameter 'q' giving the allowed
          proportion of false positives. The false discovery rate
          ('fdr') can also be conrtolled.

       k: The allowed number of false positives for gFWER control.
          Default is 0 (FWER).

       q: The allowed proportion of false positives for TPPFP control.
          Default is 0.1.

fdr.method: Character string indicating which FDR controlling method
          should be used when 'typeone="fdr"'. The options are
          "conservative" (default) for the more conservative, general
          FDR controlling procedure and "restricted" for the method
          which requires more assumptions.

   alpha: The target nominal type I error rate, which may be a vector
          of error rates. Default is 0.05.

smooth.null: Indicator of whether to use a kernal density estimate for
          the tail of the null distributon for computing raw pvalues
          close to zero. Only used if 'rawp' would be zero without
          smoothing. Default is 'FALSE'.

nulldist: Character string indicating which resampling method to use
          for estimating the joint test statistics null distribution,
          by default non-parametric bootstrap ('boot').

  csnull: Indicator of whether the bootstrap estimated test statistics
          distribution should be centered and scaled (to produce a null
          distirbution) or not. If 'csnull==FALSE', the non-null
          bootstrap estimated test statistics distribution is returned.

       B: The number of bootstrap iterations (i.e. how many resampled
          data sets) or the number of permutations (if 'nulldist' is
          'perm'). Can be reduced to increase the speed of computation,
          at a cost to precision. Default is 1000.

  method: The multiple testing procedure to use. Options are
          single-step maxT ('ss.maxT', default), single-step minP
          ('ss.minP'), step-down maxT ('sd.maxT'), and step-down minP
          ('sd.minP').

  get.cr: Logical indicating whether to compute confidence intervals
          for the estimates. Not available for f-tests. Default is
          'FALSE'.

get.cutoff: Logical indicating whether to compute thresholds for the
          test statistics. Default is 'FALSE'.

get.adjp: Logical indicating whether to compute adjusted p-values.
          Default is 'TRUE'.

keep.nulldist: Logical indicating whether to return the computed null
          distribution, by default 'TRUE'. Note that this matrix can be
          quite large. 

    seed: Integer or vector of integers to be used as argument to
          'set.seed' to set the seed for the random number generator
          for bootstrap resampling. This argument can be used to repeat
          exactly a test performed with a given seed. If the seed is
          specified via this argument, the same seed will be returned
          in the seed slot of the MTP object created. Else a random
          seed(s) will be generated, used and returned. Vector of
          integers used to specify seeds for each node in a cluster
          used to to generate a bootstrap null distribution.

 cluster: Integer for number of nodes to create or a cluster object
          created through the package snow. With 'cluster=1', bootstrap
          is implemented on single node. Supplying a cluster object
          results in the bootstrap being implemented in parallel on the
          provided nodes. This option is only available for the
          bootstrap procedure. With default value of 1, bootstrap is
          executed on single CPU.

    type: Interface system to use for computer cluster. See 'snow'
          package for details.

dispatch: The number or percentage of bootstrap iterations to dispatch
          at a time to each node of the cluster if a computer cluster
          is used. If dispatch is a percentage, 'B*dispatch' must be an
          integer. If dispatch is an integer, then 'B/dispatch' must be
          an integer. Default is 5 percent.

_D_e_t_a_i_l_s:

     A multiple testing procedure (MTP) is defined by choices of test
     statistics, type I error rate, null distribution and method for
     error rate control. Each component is described here. See
     references for more detail.

     Test statistics are determined by the values of 'test': 

     _t._o_n_e_s_a_m_p: one-sample t-statistic for tests of means;

     _t._t_w_o_s_a_m_p._e_q_u_a_l_v_a_r: equal variance two-sample t-statistic for
          tests of differences in means (two-sample t-statistic);

     _t._t_w_o_s_a_m_p._u_n_e_q_u_a_l_v_a_r: unequal variance two-sample t-statistic for
          tests of differences in means (two-sample Welch t-statistic);

     _t._p_a_i_r: two-sample paired t-statistic for tests of differences in
          means;

     _f: multi-sample f-statistic for tests of equality of population
          means (assumes constant variance across groups, but not
          normality); 

     _f._b_l_o_c_k: multi-sample f-statistic for tests of equality of
          population means in a block design (assumes constant variance
          across groups, but not normality). This test is not available
          with the bootstrap null distribution;

     _f._t_w_o_w_a_y: multi-sample f-statistic for tests of equality of
          population means in a block design (assumes constant variance
          across groups, but not normality). Differs from 'f.block' in
          requiring multiple observations per group*block combintation.
          This test uses the means of each group*block combination as
          response variable and test for group main effects assuming a
          randomized block design;

     _l_m._X_v_s_Z: t-statistic for tests of regression coefficients for
          variable 'Z.test' in linear models, each with a row of X as
          outcome, possibly adjusted by covariates 'Z.incl' from the
          matrix 'Z' (in the case of no covariates, one recovers the
          one-sample t-statistic, 't.onesamp');

     _l_m._Y_v_s_X_Z: t-statistic for tests of regression coefficients in
          linear models, with outcome Y and each row of X as covariate
          of interest, with possibly other covariates 'Z.incl' from the
          matrix 'Z';

     _c_o_x_p_h._Y_v_s_X_Z: t-statistic for tests of regression coefficients in
          Cox proportional hazards survival models, with outcome Y and
          each row of X as covariate of interest, with possibly other
          covariates 'Z.incl' from the matrix 'Z'.

     When 'robust=TRUE', non-parametric versions of each test are
     performed. For the linear models, this means 'rlm' is used instead
     of 'lm'. There is not currently a robust version of
     'test=coxph.YvsXZ'. For the t- and f-tests, data values are simply
     replaced by their ranks. This is equivalent to performing the
     following familiar named rank-based tests. The conversion after
     each test is the formula to convert from the MTP test to the
     statistic reported by the listed R function (where num is the
     numerator of the MTP test statistics, n is total sample size, nk
     is group k sample size, K is total number of groups or treatments,
     and rk are the ranks in group k).

     _t._o_n_e_s_a_m_p _o_r _t._p_a_i_r: Wilcoxon signed rank, 'wilcox.test' with
          'y=NULL' or 'paired=TRUE', 
           conversion: num/n

     _t._t_w_o_s_a_m_p._e_q_u_a_l_v_a_r: Wilcoxon rank sum or Mann-Whitney,
          'wilcox.test', 
           conversion: n2*(num+mean(r1)) - n2*(n2+1)/2

     _f: Kruskal-Wallis rank sum, 'kruskal.test', 
           conversion: num*12/(n*(n-1)

     _f._b_l_o_c_k: Friedman rank sum, 'friedman.test', 
           conversion: num*12/(K*(K+1))

     _f._t_w_o_w_a_y: Friedman rank sum, 'friedman.test', 
           conversion: num*12/(K*(K+1))

     The implemented MTPs are based on control of the family-wise error
     rate, defined as the probability of any false positives. Let Vn
     denote the (unobserved) number of false positives. Then, control
     of FWER at level alpha means that Pr(Vn>0)<=alpha. The set of
     rejected hypotheses under a FWER controlling procedure can be
     augmented to increase the number of rejections, while controlling
     other error rates. The generalized family-wise error rate is
     defined as Pr(Vn>k)<=alpha, and it is clear that one can simply
     take an FWER controlling procedure, reject k more hypotheses and
     have control of gFWER at level alpha. The tail probability of the
     proportion of false positives depends on both the number of false
     postives (Vn) and the number of rejections (Rn). Control of TPPFP
     at level alpha means Pr(Vn/Rn>q)<=alpha, for some proportion q.
     Control of the false discovery rate refers to the expected
     proportion of false positives (rather than a tail probability).
     Control of FDR at level alpha means E(Vn/Rn)<=alpha.

     In practice, one must choose a method for estimating the test
     statistics null distribution. We have implemented an ordinary
     non-parametric bootstrap estimator and a permutation estimator
     (which makes sense in certain settings, see references). The
     non-parametric bootstrap estimator (default) provides asymptotic
     control of the type I error rate for any data generating
     distribution, whereas the permutation estimator requires the
     subset pivotality assumption. One draw back of both methods is the
     discreteness of the estimated null distribution when the sample
     size is small. Furthermore, when the sample size is small enough,
     it is possible that ties will lead to a very small variance
     estimate. Using 'standardize=FALSE' allows one to avoid these
     unusually small test statistic denominators. Parametric bootstrap
     estimators are another option (not yet implemented).

     Given observed test statistics, a type I error rate (with nominal
     level), and a test statistics null distribution, MTPs provide
     adjusted p-values, cutoffs for test statistics, and possibly
     confidence regions for estimates. Four methods are implemented,
     based on minima of p-values and maxima of test statistics. Only
     the step down methods are currently available with the permutation
     null distribution.

     Computation times using a bootstrap null distribution are slower
     when weights are used for one and two-sample tests. Computation
     times when using a bootstrap null distribution also are slower for
     the tests 'lmXvsZ', 'lmYvsXZ', 'coxph.YvsXZ'.

     To execute the bootstrap on a computer cluster, a cluster object
     generated with 'makeCluster' in the package 'snow' may be used as
     the argument for cluster. Alternatively, the number of nodes to
     use in the computer cluster can be used as the argument to
     cluster. In this case, 'type' must be specified and a cluster will
     be created. In both cases, 'Biobase' and 'multtest' will be loaded
     onto each cluster node if these libraries are located in a
     directory in the standard search path. If these libraries are in a
     non-standard location, it is necessary to first create the
     cluster, load 'Biobase' and 'multtest' on each node and then to
     use the cluster object as the argument to cluster. See
     documentation for 'snow' package for additional information on
     creating and using a cluster.

_V_a_l_u_e:

     An object of class 'MTP', with the following slots:

'statistic': Object of class 'numeric', observed test statistics for
          each hypothesis, specified by the values of the 'MTP'
          arguments 'test', 'robust', 'standardize', and 'psi0'.

'estimate': For the test of single-parameter null hypotheses using
          t-statistics (i.e., not the F-tests), the numeric vector of
          estimated parameters corresponding to each hypothesis, e.g.
          means, differences in means, regression parameters.

'sampsize': Object of class 'numeric', number of columns (i.e.
          observations) in the input data set.

  'rawp': Object of class 'numeric', unadjusted, marginal p-values for
          each hypothesis.

  'adjp': Object of class 'numeric', adjusted (for multiple testing)
          p-values for each hypothesis (computed only if the 'get.adjp'
          argument is TRUE).

'conf.reg': For the test of single-parameter null hypotheses using
          t-statistics (i.e., not the F-tests), the numeric array of
          lower and upper simultaneous confidence limits for the
          parameter vector, for each value of the nominal Type I error
          rate 'alpha' (computed only if the 'get.cr' argument is
          TRUE).

'cutoff': The numeric matrix of cut-offs for the vector of test
          statistics for each value of the nominal Type I error rate
          'alpha' (computed only if the 'get.cutoff' argument is TRUE).

'reject': Object of class '"matrix"', rejection indicators (TRUE for a
          rejected null hypothesis), for each value of the nominal Type
          I error rate 'alpha'.

'nulldist': The numeric matrix for the estimated test statistics null
          distribution (returned only if 'keep.nulldist=TRUE'; option
          not currently available for permutation null distribution,
          i.e.,  'nulldist="perm"'). By default (i.e., for
          'nulldist="boot"'), the entries of 'nulldist' are the null
          value shifted and scaled bootstrap test statistics, with one
          null test statistic value for each hypothesis (rows) and
          bootstrap iteration (columns).

  'call': Object of class 'call', the call to the MTP function.

  'seed': An integer or vector for specifying the state of the random
          number generator used to create the resampled datasets. The
          seed can be reused for reproducibility in a repeat call to
          'MTP'. This argument is currently used only for the bootstrap
          null distribution (i.e., for 'nulldist="boot"'). See '?
          set.seed' for details.

_N_o_t_e:

     Thank you to Peter Dimitrov for suggestions about the code.

_A_u_t_h_o_r(_s):

     Katherine S. Pollard with design contributions from Sandra Taylor,
     Sandrine Dudoit and Mark J. van der Laan.

_R_e_f_e_r_e_n_c_e_s:

     M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Augmentation
     Procedures for Control of the Generalized Family-Wise Error Rate
     and Tail Probabilities for the Proportion of False Positives,
     Statistical Applications in Genetics and Molecular Biology, 3(1). 
     <URL: http://www.bepress.com/sagmb/vol3/iss1/art15/>

     M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Multiple
     Testing. Part II. Step-Down Procedures for Control of the
     Family-Wise Error Rate, Statistical Applications in Genetics and
     Molecular Biology, 3(1). <URL:
     http://www.bepress.com/sagmb/vol3/iss1/art14/>

     S. Dudoit, M.J. van der Laan, K.S. Pollard (2004), Multiple
     Testing. Part I. Single-Step Procedures for Control of General
     Type I Error Rates, Statistical Applications in Genetics and
     Molecular Biology, 3(1). <URL:
     http://www.bepress.com/sagmb/vol3/iss1/art13/>

     Katherine S. Pollard and Mark J. van der Laan, "Resampling-based
     Multiple Testing: Asymptotic Control of Type I Error and
     Applications to Gene Expression Data" (June 24, 2003). U.C.
     Berkeley Division of Biostatistics Working Paper Series. Working
     Paper 121. <URL: http://www.bepress.com/ucbbiostat/paper121>

_S_e_e _A_l_s_o:

     'MTP-class', 'MTP-methods', 'mt.minP', 'mt.maxT', 'ss.maxT',
     'fwer2gfwer'

_E_x_a_m_p_l_e_s:

     #data 
     set.seed(99)
     data<-matrix(rnorm(90),nr=9)
     group<-c(rep(1,5),rep(0,5))

     #fwer control with bootstrap null distribution (B=100 for speed)
     m1<-MTP(X=data,Y=group,alternative="less",B=100,method="sd.minP")
     print(m1)
     summary(m1)
     par(mfrow=c(2,2))
     plot(m1,top=9)

