vsn2                   package:vsn                   R Documentation

_F_i_t _t_h_e _v_s_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     'vsn2' fits the vsn model to the data  in 'x' and returns a 'vsn'
     object with the fit parameters and the transformed data matrix.
     The data are, typically, feature intensity readings from a
     microarray, but this function may also be useful for other kinds
     of intensity data that obey an additive-multiplicative error
     model. To obtain an object of the same class as 'x', containing
     the normalised data and the same metdata as 'x', use


         fit = vsn2(x, ...)
         nx = predict(fit, newdata=x)

     or the wrapper 'justvsn'. Please see the vignette _Introduction to
     vsn_.

_U_s_a_g_e:

     vsnMatrix(x,
               reference,
               strata,
               lts.quantile = 0.9,
               subsample    = 0L,
               verbose      = interactive(),
               returnData   = TRUE,
               calib        = "affine",
               pstart,
               minDataPointsPerStratum = 42L,
               optimpar     = list(),
               defaultpar   = list(factr=5e7, pgtol=2e-4, maxit=60000L,
                                   trace=0L, cvg.niter=7L, cvg.eps=0))

     ## S4 method for signature 'ExpressionSet':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'AffyBatch':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'matrix':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'NChannelSet':
     vsn2(x, reference, strata, backgroundsubtract=FALSE,
            foreground=c("R","G"), background=c("Rb", "Gb"), ...)

     ## S4 method for signature 'RGList':
     vsn2(x, reference, strata, backgroundsubtract=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

       x: An object containing the data to which the model is fitted.

reference: Optional, a 'vsn' object from a previous fit. If this
          argument is specified, the data in 'x' are normalized
          "towards" an existing set of reference arrays whose
          parameters are stored in the object 'reference'. If this
          argument is not specified, then the data in 'x' are
          normalized "among themselves". See Details for a more precise
          explanation.

  strata: Optional, a 'factor' or 'integer' whose length is 'nrow(x)'.
          It can be used for stratified normalization (i.e. separate
          offsets a and factors b for each level of 'strata'). If
          missing, all rows of 'x' are assumed to come from one
          stratum. If 'strata' is an integer, its values must cover the
          range 1,...,n, where n is the number of strata.

lts.quantile: Numeric of length 1. The quantile that is used for the
          resistant least trimmed sum of squares regression. Allowed
          values are between 0.5 and 1. A value of 1 corresponds to
          ordinary least sum of squares regression.

subsample: Integer of length 1. If its value is greater than 0, the
          model parameters are estimated from a subsample of the data
          of size 'subsample' only, yet the fitted transformation is
          then applied to all data. For large datasets, this can
          substantially reduce the CPU time and memory consumption at a
          negligible loss of precision. Note that the 'AffyBatch'
          method of 'vsn2' sets a value of '30000' for this parameter
          if it is missing from the function call - which is different
          from the behaviour of the other methods.

backgroundsubtract: Logical of length 1: should local background
          estimates be subtracted before fitting vsn?

foreground, background: Aligned character vectors of the same length,
          naming the channels of 'x' that should be used as foreground
          and background values.

 verbose: Logical. If TRUE, some messages are printed.

returnData: Logical. If TRUE, the transformed data are returned in a
          slot of the resulting 'vsn' object. Setting this option to
          'FALSE' allows saving memory if the data are not needed.

   calib: Character of length 1. Allowed values are 'affine' and
          'none'. The default, 'affine', corresponds to the behaviour
          in package versions <= 3.9, and to what is described in
          references [1] and [2]. The option 'none' is an experimental
          new feature, in which no affine calibration is performed and
          only two global variance stabilisation transformation
          parameters 'a' and 'b' are fitted. This functionality might
          be useful in conjunction with other calibration methods, such
          as quantile normalisation - see the vignette _Introduction to
          vsn_.

  pstart: Optional, a three-dimensional numeric array that specifies
          start values for the iterative parameter estimation
          algorithm. If not specified, the function tries to guess
          useful start values. The first dimension corresponds to the
          levels of 'strata', the second dimension to the columns of
          'x' and the third dimension must be 2, corresponding to
          offsets and factors.

minDataPointsPerStratum: The minimum number of data points per stratum. 

optimpar: Optional, a list with parameters for the likelihood
          optimisation algorithm. Default parameters are taken from
          'defaultpar'. See details.

defaultpar: The default parameters  for the likelihood optimisation
          algorithm. Values in 'optimpar' take precedence over those in
          'defaultpar'. The purpose of this argument is to expose the
          default values in this manual page - it is not intended to be
          changed, please use 'optimpar' for that.

     ...: Arguments that get passed on to 'vsnMatrix'.

_V_a_l_u_e:

     An object of class 'vsn'.

_N_o_t_e _o_n _o_v_e_r_a_l_l _s_c_a_l_e _a_n_d _l_o_c_a_t_i_o_n _o_f _t_h_e _g_l_o_g
  _t_r_a_n_s_f_o_r_m_a_t_i_o_n:

     The data are returned on a glog scale to base 2. More precisely,
     the transformed data are subject to the transformation
     glog_2(f(b)*x+a) + c, where the function glog_2(u) =
     log_2(u+sqrt{u*u+1}) = asinh(u)/log(2) is called the generalised
     logarithm, the offset a and the scaling parameter b are the fitted
     model parameters (see references), and f(x)=exp(x) is a parameter
     transformation that allows ensuring positivity of the factor in
     front of x while using an unconstrained optimisation over b [4].
     The overall offset c is computed from the b's such that for large
     x the transformation approximately corresponds to the log_2
     function. This is done separately for each stratum, but with the
     same value across arrays. More precisely, if the element 'b[s,i]'
     of the array _b_ is the scaling parameter for the 's'-th stratum
     and the 'i'-th array, then 'c[s]' is computed as
     'log2(2*f(mean(b[,i])))'. The offset _c_ is inconsequential for
     all differential expression calculations, but many users like to
     see the data in a range that they are familiar with.

_S_p_e_c_i_f_i_c _b_e_h_a_v_i_o_u_r _o_f _t_h_e _d_i_f_f_e_r_e_n_t _m_e_t_h_o_d_s:

     'vsn2' methods exist for 'ExpressionSet', 'NChannelSet',
     'AffyBatch' (from the 'affy' package), 'RGList' (from the 'limma'
     package), 'matrix' and 'numeric'. If 'x' is an 'NChannelSet', then
     'vsn2' is applied to the matrix that is obtained by horizontally
     concatenating the color channels. Optionally, available background
     estimates can be subtracted before. If 'x' is an 'RGList', it is
     converted into an 'NChannelSet' using a copy of Martin Morgan's
     code for 'RGList' to 'NChannelSet' coercion, then the
     'NChannelSet' method is called.

_S_t_a_n_d_a_l_o_n_e _v_e_r_s_u_s _r_e_f_e_r_e_n_c_e _n_o_r_m_a_l_i_s_a_t_i_o_n:

     If the 'reference' argument is _not_ specified, then the model
     parameters mu_k and sigma are fit from the data in 'x'. This is
     the mode of operation described in [1] and that was the only
     option in versions 1.X of this package. If 'reference' is
     specified, the model parameters mu_k and sigma are taken from it.
     This allows for 'incremental' normalization [4].

_C_o_n_v_e_r_g_e_n_c_e _o_f _t_h_e _i_t_e_r_a_t_i_v_e _l_i_k_e_l_i_h_o_o_d _o_p_t_i_m_i_s_a_t_i_o_n:

     'L-BFGS-B' uses three termination criteria:

        1.  '(f_k - f_{k+1}) / max(|f_k|, |f_{k+1}|, 1) <= factr *
           epsmch' where 'epsmch' is the machine precision.

        2.  '|gradient| < pgtol'

        3.  'iterations > maxit'

     These are set by the elements 'factr', 'pgtol' and 'maxit' of
     'optimpar'. The remaining elements are

     '_t_r_a_c_e' An integer between 0 and 6, indicating the verbosity level
          of 'L-BFGS-B', higher values create more output.

     '_c_v_g._n_i_t_e_r' The number of iterations to be used in the least
          trimmed sum of squares regression.

     '_c_v_g._e_p_s' Numeric. A convergence threshold for the least trimmed
          sum of squares regression.

_A_u_t_h_o_r(_s):

     Wolfgang Huber <URL: http://www.ebi.ac.uk/huber>

_R_e_f_e_r_e_n_c_e_s:

     [1] Variance stabilization applied to microarray data calibration
     and to the quantification of differential expression, Wolfgang
     Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka,
     Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

     [2] Parameter estimation for the calibration and variance
     stabilization  of microarray data,  Wolfgang Huber, Anja von
     Heydebreck, Holger Sueltmann,  Annemarie Poustka, and Martin
     Vingron;   Statistical Applications in Genetics and Molecular
     Biology (2003) Vol. 2 No. 1, Article 3.
     http://www.bepress.com/sagmb/vol2/iss1/art3.

     [3] L-BFGS-B: Fortran Subroutines for Large-Scale Bound
     Constrained Optimization, C. Zhu, R.H. Byrd, P. Lu and J. Nocedal,
     Technical Report, Northwestern University (1996).

     [4] Package vignette: Likelihood Calculations for vsn

_S_e_e _A_l_s_o:

     'justvsn', 'predict'

_E_x_a_m_p_l_e_s:

     data("kidney")

     fit = vsn2(kidney)                   ## fit
     nkid = predict(fit, newdata=kidney)  ## apply fit

     plot(exprs(nkid), pch=".")
     abline(a=0, b=1, col="red")

