vsn2                   package:vsn                   R Documentation

_F_i_t _t_h_e _v_s_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     'vsn2' fits the vsn model to the data  in 'x' and returns a 'vsn'
     object with the fit parameters and the transformed data matrix.
     The data are, typically, feature intensity readings from a
     microarray, but this function may also be useful for other kinds
     of intensity data that obey an additive-multiplicative error
     model. To obtain an object of the same class as 'x', containing
     the normalised data and the same metdata as 'x', use


         fit = vsn2(x, ...)
         nx = predict(fit, newdata=x)

     or the wrapper 'justvsn'. Please see the vignette _Introduction to
     vsn_ for a description on how to use 'vsn2' for different use
     cases.

_U_s_a_g_e:

     vsnMatrix(x,
               reference,
               strata,
               lts.quantile = 0.9,
               subsample    = 0L,
               verbose      = interactive(),
               returnData   = TRUE,
               pstart,
               minDataPointsPerStratum = 42L,
               optimpar     = list(),
               defaultpar   = list(factr=5e7, pgtol=2e-4, maxit=60000L,
                                   trace=0L, cvg.niter=7L, cvg.eps=0))

     ## S4 method for signature 'ExpressionSet':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'AffyBatch':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'matrix':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'NChannelSet':
     vsn2(x, reference, strata, backgroundsubtract=FALSE,
            foreground=c("R","G"), background=c("Rb", "Gb"), ...)

     ## S4 method for signature 'RGList':
     vsn2(x, reference, strata, backgroundsubtract=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

       x: An object containing the data to which the model is to be
          fitted.

reference: Optional, a 'vsn' object from a previous fit. If this
          argument is specified, the data in 'x' are normalized
          "towards" an existing set of reference arrays whose
          parameters are stored in the object 'reference'. If this
          argument is not specified, then the data in 'x' are
          normalized "among themselves". See Details for a more precise
          explanation.

  strata: Optional, a 'factor' or 'integer' whose length is 'nrow(x)'.
          Can be used for stratified normalization (i.e. separate
          offsets 'a' and factors 'b' for each level of 'strata'). If
          missing, all rows of 'x' are assumed to come from one
          stratum. If 'strata' is an integer, its values must cover the
          range 1..._n_, where _n_ is the number of strata.

lts.quantile: Numeric of length 1. The quantile that is used for the
          resistant least trimmed sum of squares regression. Allowed
          values are between 0.5 and 1. A value of 1 corresponds to
          ordinary least sum of squares regression.

subsample: Integer of length 1. If specified, the model parameters are
          estimated from a subsample of the data of size 'subsample'
          only, yet the fitted transformation is then applied to all
          data. For large datasets, this can substantially reduce the
          CPU time and memory consumption at a negligible loss of
          precision.

backgroundsubtract: Logical of length 1: should local background
          estimates be subtracted before fitting vsn?

foreground, background: Aligned character vectors of the same length,
          naming the channels of 'x' that should be used as foreground
          and background values.

 verbose: Logical. If TRUE, some messages are printed.

returnData: Logical. If TRUE, the transformed data are returned in a
          slot of the resulting 'vsn' object. Setting this option to
          'FALSE' allows saving memory if the data are not needed.

  pstart: Optional, a three-dimensional numeric array that specifies
          start values for the iterative parameter estimation
          algorithm. If not specified, the function tries to guess
          useful start values. The first dimension corresponds to the
          levels of 'strata', the second dimension to the columns of
          'x' and the third dimension must be 2, corresponding to
          offsets and factors.

minDataPointsPerStratum: The minimum number of data points per stratum. 

optimpar: Optional, a list with parameters for the likelihood
          optimisation algorithm. Default parameters are taken from
          'defaultpar'. See details.

defaultpar: The default parameters  for the likelihood optimisation
          algorithm. Values in 'optimpar' take precedence over those in
          'defaultpar'. The purpose of this argument is to expose the
          default values in this manual page - it is not intended to be
          changed, please use 'optimpar' for that.

     ...: Arguments that get passed on to 'vsnMatrix'.

_V_a_l_u_e:

     An object of class 'vsn'.

_N_o_t_e _o_n _o_v_e_r_a_l_l _s_c_a_l_e _a_n_d _l_o_c_a_t_i_o_n _o_f _t_h_e _g_l_o_g
  _t_r_a_n_s_f_o_r_m_a_t_i_o_n:

     The data are returned on a _glog_ scale to base 2. More precisely,
     the transformed data are subject to the transformation
     _glog2(f(b)*x+a) + c_, where the function _glog2(u) =
     log2(u+sqrt(u*u+1)) = asinh(u)/log(2)_ is called the generalised
     logarithm, the offset _a_ and the scaling parameter _b_ are the
     fitted model parameters (see references), and _f(x)=exp(x)_ is a
     parameter transformation that allows ensuring positivity of the
     factor in front of _x_ while using an unconstrained optimisation
     over _b_ [4]. Different parameters  _a_ and _b_ are fit for each
     array, and, if applicable, for each stratum. The overall offset
     _c_ is computed from the _b_'s such that for large _x_ the
     transformation approximately corresponds to the _log2_ function.
     This is done separately for each stratum, but with the same value
     across arrays. More precisely, if the element 'b[s,i]' of the
     array _b_ is the scaling parameter for the 's'-th stratum and the
     'i'-th array, then 'c[s]' is computed as 'log2(2*f(mean(b[,i])))'.
     The offset _c_ is inconsequential for all differential expression
     calculations, but many users like to see the data in a range that
     they are familiar with.

_S_p_e_c_i_f_i_c _b_e_h_a_v_i_o_u_r _o_f _t_h_e _d_i_f_f_e_r_e_n_t _m_e_t_h_o_d_s:

     'vsn2' methods exist for 'ExpressionSet', 'NChannelSet',
     'AffyBatch' (from the 'affy' package), 'RGList' (from the 'limma'
     package), 'matrix' and 'numeric'. If 'x' is an 'NChannelSet', then
     'vsn2' is applied to the matrix that is obtained by horizontally
     concatenating the color channels. Optionally, available background
     estimates can be subtracted before. If 'x' is an 'RGList', it is
     converted into an 'NChannelSet' using a copy of Martin Morgan's
     code for 'RGList' to 'NChannelSet' coercion, then the
     'NChannelSet' method is called.

_S_t_a_n_d_a_l_o_n_e _v_e_r_s_u_s _r_e_f_e_r_e_n_c_e _n_o_r_m_a_l_i_s_a_t_i_o_n:

     If the 'reference' argument is _not_ specified, then the model
     parameters $mu_k$ and $sigma$ are fit from the data in 'x'. This
     is the mode of operation described in [1] and that was the only
     option in versions 1.X of this package. If 'reference' is
     specified, the model parameters $mu_k$ and $sigma$ are taken from
     it. This allows for 'incremental' normalization [4].

_C_o_n_v_e_r_g_e_n_c_e _o_f _t_h_e _i_t_e_r_a_t_i_v_e _l_i_k_e_l_i_h_o_o_d _o_p_t_i_m_i_s_a_t_i_o_n:

     'L-BFGS-B' uses three termination criteria:

        1.  '(f_k - f_{k+1}) / max(|f_k|, |f_{k+1}|, 1) <= factr *
           epsmch' where 'epsmch' is the machine precision.

        2.  '|gradient| < pgtol'

        3.  'iterations > maxit'

     These are set by the elements 'factr', 'pgtol' and 'maxit' of
     'optimpar'. The remaining elements are

     '_t_r_a_c_e' An integer between 0 and 6, indicating the verbosity level
          of 'L-BFGS-B', higher values create more output.

     '_c_v_g._n_i_t_e_r' The number of iterations to be used in the least
          trimmed sum of squares regression.

     '_c_v_g._e_p_s' Numeric. A convergence threshold for the least trimmed
          sum of squares regression.

_A_u_t_h_o_r(_s):

     Wolfgang Huber <URL: http://www.ebi.ac.uk/huber>

_R_e_f_e_r_e_n_c_e_s:

     [1] Variance stabilization applied to microarray data calibration
     and to the quantification of differential expression, Wolfgang
     Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka,
     Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

     [2] Parameter estimation for the calibration and variance
     stabilization  of microarray data,  Wolfgang Huber, Anja von
     Heydebreck, Holger Sueltmann,  Annemarie Poustka, and Martin
     Vingron;   Statistical Applications in Genetics and Molecular
     Biology (2003) Vol. 2 No. 1, Article 3.
     http://www.bepress.com/sagmb/vol2/iss1/art3.

     [3] L-BFGS-B: Fortran Subroutines for Large-Scale Bound
     Constrained Optimization, C. Zhu, R.H. Byrd, P. Lu and J. Nocedal,
     Technical Report, Northwestern University (1996).

     [4] Package vignette: Likelihood Calculations for vsn

_S_e_e _A_l_s_o:

     'justvsn', 'predict'

_E_x_a_m_p_l_e_s:

     data("kidney")

     fit = vsn2(kidney)                   ## fit
     nkid = predict(fit, newdata=kidney)  ## apply fit

     plot(exprs(nkid), pch=".")
     abline(a=0, b=1, col="red")

