vsn2                   package:vsn                   R Documentation

_F_i_t _t_h_e _v_s_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     'vsn2' fits the vsn model to the data  in 'x' and returns a 'vsn'
     object with the fit parameters and the transformed data matrix.
     The data are, typically, feature intensity readings from a
     microarray. To obtain an object of the same class as 'x',
     containing the normalised data and the same metdata as 'x', use


         fit = vsn2(x, ...)
         nx = predict(fit, newdata=x)

     or the wrapper 'justvsn'. Please see the vignette _Introduction to
     vsn_ for a description on how to use 'vsn2' for different use
     cases.

_U_s_a_g_e:

     vsnMatrix(x,
       reference,
       strata,
       lts.quantile = 0.9,
       subsample    = 0L,
       verbose      = interactive(),
       returnData   = TRUE,
       pstart,
       optimpar     = list(),
       defaultpar   = list(factr=5e7, pgtol=2e-4, maxit=60000L, trace=0L, cvg.niter=7L, cvg.eps=0))

     ## S4 method for signature 'ExpressionSet':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'AffyBatch':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'matrix':
     vsn2(x, reference, strata, ...)

     ## S4 method for signature 'NChannelSet':
     vsn2(x, reference, strata, backgroundsubtract=FALSE,
            foreground=c("R","G"), background=c("Rb", "Gb"), ...)

     ## S4 method for signature 'RGList':
     vsn2(x, reference, strata, backgroundsubtract=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

       x: An object containing the data to which the model is to be
          fitted.

reference: Optional, a 'vsn' object from a previous fit. If this
          argument is specified, the data in 'x' are normalized
          "towards" an existing set of reference arrays whose
          parameters are stored in the object 'reference'. If this
          argument is not specified, then the data in 'x' are
          normalized "among themselves". See Details for a more precise
          explanation.

  strata: Optional, a 'factor' or 'integer' whose length is 'nrow(x)'.
          Can be used for stratified normalization (i.e. separate
          offsets 'a' and factors 'b' for each level of 'strata'). If
          missing, all rows of 'x' are assumed to come from one
          stratum. If 'strata' is an integer, its values must cover the
          range 1..._n_, where _n_ is the number of strata.

lts.quantile: Numeric of length 1. The quantile that is used for the
          resistant least trimmed sum of squares regression. Allowed
          values are between 0.5 and 1. A value of 1 corresponds to
          ordinary least sum of squares regression.

subsample: Integer of length 1. If specified, the model parameters are
          estimated from a subsample of the data of size 'subsample'
          only, yet the fitted transformation is then applied to all
          data. For large datasets, this can substantially reduce the
          CPU time and memory consumption at a negligible loss of
          precision.

backgroundsubtract: Logical of length 1: should local background
          estimates be subtracted before fitting vsn?

foreground, background: Aligned character vectors of the same length,
          naming the channels of 'x' that should be used as foreground
          and background values.

 verbose: Logical. If TRUE, some messages are printed.

returnData: Logical. If TRUE, the transformed data are returned in a
          slot of the resulting 'vsn' object. Setting this option to
          'FALSE' allows saving memory if the data are not needed.

  pstart: Optional, a three-dimensional numeric array that specifies
          start values for the iterative parameter estimation
          algorithm. If not specified, the function tries to guess
          useful start values. The first dimension corresponds to the
          levels of 'strata', the second dimension to the columns of
          'x' and the third dimension must be 2, corresponding to
          offsets and factors.

optimpar: Optional, a list with parameters for the likelihood
          optimisation algorithm. Default parameters are taken from
          'defaultpar'. See details.

defaultpar: The default parameters  for the likelihood optimisation
          algorithm. Values in 'optimpar' take precedence over those in
          'defaultpar'. The purpose of this argument is to expose the
          default values in this manual page - it is not intended to be
          changed, please use 'optimpar' for that.

     ...: Arguments that get passed on to 'vsnMatrix'.

_V_a_l_u_e:

     An object of class 'vsn'.

_N_o_t_e _o_n _o_v_e_r_a_l_l _s_c_a_l_e _a_n_d _l_o_c_a_t_i_o_n _o_f _t_h_e _g_l_o_g
  _t_r_a_n_s_f_o_r_m_a_t_i_o_n:

     The data are returned on a glog scale to base 2. More precisely,
     the transformed data are subject to the transformation
     glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) =
     asinh(u)/log(2) is called the generalised logarithm, a and b are
     the fitted model parameters (see references), f is a parameter
     transformation [4], and the overall constant offset c is computed
     from b such that for large x the transformation approximately
     corresponds to the log2 function. The offset c is inconsequential
     for all differential expression calculations, but many users like
     to see the data in a range that they are familiar with.

_S_p_e_c_i_f_i_c _b_e_h_a_v_i_o_u_r _o_f _t_h_e _d_i_f_f_e_r_e_n_t _m_e_t_h_o_d_s:

     'vsn2' methods exist for 'ExpressionSet', 'NChannelSet',
     'AffyBatch' (from the 'affy' package), 'RGList' (from the 'limma'
     package), 'matrix' and 'numeric'. If 'x' is an 'NChannelSet', then
     'vsn2' is applied to the matrix that is obtained by horizontally
     concatenating the color channels. Optionally, available background
     estimates can be subtracted before. If 'x' is an 'RGList', it is
     converted into an 'NChannelSet' using a copy of Martin Morgan's
     code for 'RGList' to 'NChannelSet' coercion, then the
     'NChannelSet' method is called.

_S_t_a_n_d_a_l_o_n_e _v_e_r_s_u_s _r_e_f_e_r_e_n_c_e _n_o_r_m_a_l_i_s_a_t_i_o_n:

     If the 'reference' argument is _not_ specified, then the model
     parameters $mu_k$ and $sigma$ are fit from the data in 'x'. This
     is the mode of operation described in [1] and that was the only
     option in versions 1.X of this package. If 'reference' is
     specified, the model parameters $mu_k$ and $sigma$ are taken from
     it. This allows for 'incremental' normalization [4].

_C_o_n_v_e_r_g_e_n_c_e _o_f _t_h_e _i_t_e_r_a_t_i_v_e _l_i_k_e_l_i_h_o_o_d _o_p_t_i_m_i_s_a_t_i_o_n:

     'L-BFGS-B' uses three termination criteria:

        1.  '(f_k - f_{k+1}) / max(|f_k|, |f_{k+1}|, 1) <= factr *
           epsmch' where 'epsmch' is the machine precision.

        2.  '|gradient| < pgtol'

        3.  'iterations > maxit'

     These are set by the elements 'factr', 'pgtol' and 'maxit' of
     'optimpar'. The remaining elements are

     '_t_r_a_c_e' An integer between 0 and 6, indicating the verbosity level
          of 'L-BFGS-B', higher values create more output.

     '_c_v_g._n_i_t_e_r' The number of iterations to be used in the least
          trimmed sum of squares regression.

     '_c_v_g._e_p_s' Numeric. A convergence threshold for the least trimmed
          sum of squares regression.

_A_u_t_h_o_r(_s):

     Wolfgang Huber <URL: http://www.ebi.ac.uk/huber>

_R_e_f_e_r_e_n_c_e_s:

     [1] Variance stabilization applied to microarray data calibration
     and to the quantification of differential expression, Wolfgang
     Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka,
     Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

     [2] Parameter estimation for the calibration and variance
     stabilization  of microarray data,  Wolfgang Huber, Anja von
     Heydebreck, Holger Sueltmann,  Annemarie Poustka, and Martin
     Vingron;   Statistical Applications in Genetics and Molecular
     Biology (2003) Vol. 2 No. 1, Article 3.
     http://www.bepress.com/sagmb/vol2/iss1/art3.

     [3] L-BFGS-B: Fortran Subroutines for Large-Scale Bound
     Constrained Optimization, C. Zhu, R.H. Byrd, P. Lu and J. Nocedal,
     Technical Report, Northwestern University (1996).

     [4] Package vignette: Likelihood Calculations for vsn

_S_e_e _A_l_s_o:

     'justvsn', 'predict'

_E_x_a_m_p_l_e_s:

     data("kidney")

     fit = vsn2(kidney)                   ## fit
     nkid = predict(fit, newdata=kidney)  ## apply fit

     plot(exprs(nkid), pch=".")
     abline(a=0, b=1, col="red")

