imputePeaks              package:flagme              R Documentation

_I_m_p_u_t_a_t_i_n _o_f _l_o_c_a_t_i_o_n_s _o_f _p_e_a_k_s _t_h_a_t _w_e_r_e _u_n_d_e_t_e_c_t_e_d

_D_e_s_c_r_i_p_t_i_o_n:

     Using the information within the peaks that are matched across
     several runs, we can impute the location of the peaks that are
     undetected in a subset of runs

_U_s_a_g_e:

     imputePeaks(pD, obj, type = 1, obj2 = NULL, filterMin = 3, verbose = TRUE)

_A_r_g_u_m_e_n_t_s:

      pD: a 'peaksDataset' object

     obj: the alignment object, either 'multipleAlignment' or
          'progressiveAlignment', that is used to infer the unmatched
          peak locations

    type: type of imputation to do, 1 for simple linear interpolation
          (default), 2 only works if 'obj2' is a 'clusterAlignment'
          object 

    obj2: a 'clusterAlignment' object

filterMin: minimum number of peaks within a merged peak to impute

 verbose: logical, whether to print out information

_D_e_t_a_i_l_s:

     If you are aligning several samples and for a (small) subset of
     the samples in question, a peak is undetected, there is
     information within the alignment that can be useful in determining
     where the undetected peak is, based on the surrounding matched
     peaks.  Instead of moving forward with missing values into the
     data matrices, this procedures goes back to the raw data and
     imputes the location of the apex (as well as the start and end),
     so that we do not need to bother with post-hoc imputation or
     removing data because of missing components.

     We realize that imputation is prone to error and prone to
     attributing intensity from neighbouring peaks to the unmatched
     peak.  We argue that this is still better than having to deal with
     these in statistical models after that fact.  This may be an area
     of future improvement.

_V_a_l_u_e:

     'list' with 3 elements 'apex', 'start' and 'end', each masked
     matrices giving the scan numbers of the imputed peaks.

_A_u_t_h_o_r(_s):

     Mark Robinson

_R_e_f_e_r_e_n_c_e_s:

     Mark D Robinson (2008).  Methods for the analysis of gas
     chromatography - mass spectrometry data  _PhD dissertation_
     University of Melbourne.

_S_e_e _A_l_s_o:

     'multipleAlignment', 'progressiveAlignment', 'peaksDataset'

_E_x_a_m_p_l_e_s:

     require(gcspikelite)

     # paths and files
     gcmsPath<-paste(.find.package("gcspikelite"),"data",sep="/")
     cdfFiles<-dir(gcmsPath,"CDF",full=TRUE)
     eluFiles<-dir(gcmsPath,"ELU",full=TRUE)

     # read data, peak detection results
     pd<-peaksDataset(cdfFiles[1:3],mz=seq(50,550),rtrange=c(7.5,8.5))
     pd<-addAMDISPeaks(pd,eluFiles[1:3])

     # alignments
     ca<-clusterAlignment(pd, gap = .5,D=.05,df=30)
     pa<-progressiveAlignment(pd, ca, gap = .6, D=.1,df=30)

     v<-imputePeaks(pd,pa,filterMin=1)

