daglad                 package:GLAD                 R Documentation

_A_n_a_l_y_s_i_s _o_f _a_r_r_a_y _C_G_H _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function allows the detection of breakpoints in genomic
     profiles obtained by array CGH technology and affects a status
     (gain, normal or lost) to each clone.

_U_s_a_g_e:

     daglad.profileCGH(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
                       smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
                       qlambda=0.999,  bandwidth=10, sigma=NULL, base=FALSE, round=1.5,
                       lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=5,
                       method="centroid", nmin=1, nmax=8,
                       amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
                       MinBkpWeight=0.35, CheckBkpPos=TRUE,
                       verbose=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

profileCGH: Object of class 'profileCGH'

mediancenter: If 'TRUE', LogRatio are center on their median.

genomestep: If 'TRUE', a smoothing step over the whole genome is
          performed and a "clustering throughout the genome" allows to
          identify a cluster corresponding to the Normal DNA level. The
          threshold used in the 'daglad' function ('deltaN, forceGL,
          amplicon, deletion') and then compared to the median of this
          cluster.

normalrefcenter: If 'TRUE', the LogRatio are centered through the
          median of the cluster identified during the 'genomestep'.

smoothfunc: Type of algorithm used to smooth 'LogRatio' by a piecewise
          constant function. Choose either 'aws' or 'laws'.

   lkern: lkern determines the location kernel to be used (see 'laws'
          for details).

   model: model determines the distribution type of LogRatio (see
          'laws' for details).

 qlambda: qlambda determines the scale parameter qlambda for the
          stochastic penalty (see 'laws' for details).

    base: If TRUE, the position of clone is the physical position onto
          the chromosome, otherwise the rank position is used.

   sigma: Value to be passed to either argument 'sigma2'    of 'aws'
          function or 'shape' of 'laws'. If 'NULL', sigma is calculated
          from the data.

bandwidth: Set the maximal bandwidth 'hmax' in the 'aws' or  'laws'
          function. For example, if 'bandwidth=10' then the 'hmax'
          value is set to 10*X_N where X_N is the position of the last
          clone.

   round: The smoothing results of either 'aws' or 'laws' function are
          rounded or not depending on the 'round' argument. The 'round'
          value is passed to the argument 'digits' of the 'round'
          function.

lambdabreak: Penalty term (lambda') used during the  "Optimization of
          the number of breakpoints" step.

lambdaclusterGen: Penalty term (lambda*) used during the "clustering
          throughout the genome" step.

   param: Parameter of kernel used in the penalty term.

   alpha: Risk alpha used for the "Outlier detection" step.

   msize: The outliers MAD are calculated on regions with a cardinality
          greater or equal to msize.

  method: The agglomeration method to be used during the "clustering
          throughout the genome" steps.

    nmin: Minimum number of clusters (N*max) allowed during the
          "clustering throughout the genome" clustering step.

    nmax: Maximum number of clusters (N*max) allowed during the
          "clustering throughout the genome" clustering step.

amplicon: Level (and outliers) with a smoothing value (log-ratio value)
          greater than this threshold are consider as amplicon. Note
          that first, the data are centered on the normal reference
          value computed during the "clustering throughout the genome"
          step.

deletion: Level (and outliers) with a smoothing value (log-ratio value)
          lower than this threshold are consider as deletion. Note that
          first, the data are centered on the normal reference value
          computed during the "clustering throughout the genome" step.

  deltaN: Region with smoothing values in between the interval
          [-deltaN,+deltaN] are supposed to be normal.

 forceGL: Level with smoothing value greater (lower) than 'rangeGL[1]'
          ('rangeGL[2]') are considered as gain (lost). Note that
          first, the data are centered on the normal reference value
          computed during the "clustering throughout the genome" step.

 nbsigma: For each breakpoints, a weight is calculated which is a
          function of absolute value of the Gap between the smoothing
          values of the two consecutive regions. Weight = 1-
          kernelpen(abs(Gap),param=c(d=nbsigma*Sigma)) where Sigma is
          the standard deviation of the LogRatio.

MinBkpWeight: Breakpoints which 'GNLchange'==0 and 'Weight' less than
          'MinBkpWeight' are discarded.

CheckBkpPos: If 'TRUE', the accuracy position of each breakpoints is
          checked.

 verbose: If 'TRUE' some information are printed.

     ...: 

_D_e_t_a_i_l_s:

     The function 'daglad' implements a slightly modified version of
     the methodology described in the article : Analysis of array CGH
     data: from signal ratio to gain and loss of DNA regions (Hup et
     al., Bioinformatics 2004 20(18):3413-3422). The 'daglad' function
     allows to choose some threshold to help the algorithm to identify
     the status of the genomic regions. The threshodls are given in the
     following parameters:

        *  deltaN

        *  forceGL

        *  deletion

        *  amplicon

_V_a_l_u_e:

        : An object of class "profileCGH" with the following
          attributes:

*profileValues*: a data.frame with the following added information:


        *_S_m_o_o_t_h_i_n_g* The smoothing values correspond to the median of
             each Level

        *_B_r_e_a_k_p_o_i_n_t_s* The last position of a region with identical
             amount of DNA is flagged by 1 otherwise it is 0. Note that
             during the "Optimization of the number of breakpoints"
             step, removed breakpoints are flagged by -1.

        *_L_e_v_e_l* Each position with equal smoothing value are labelled
             the same way with an integer value starting from one. The
             label is incremented by one when a new level occurs or
             when moving to the next chromosome.

        *_O_u_t_l_i_e_r_s_A_w_s* Each AWS outliers are flagged by -1 (if it is in
             the alpha/2 lower tail of the distribution) or 1 (if it is
             in the alpha/2 upper tail of the distribution) otherwise 
             it is 0.

        *_O_u_t_l_i_e_r_s_M_a_d* Each MAD outliers are flagged by -1 (if it is in
             the alpha/2 lower tail of the distribution) or 1 (if it is
             in the alpha/2 upper tail of the distribution) otherwise 
             it is 0.

        *_O_u_t_l_i_e_r_s_T_o_t* OutliersAws + OutliersMad.

        *_N_o_r_m_a_l_R_e_f* Clusters which have been used to set the normal
             reference during the "clustering throughout the genome"
             step are code by 0. Note that if 'genomestep=FALSE', all
             the value are set to 0.

        *_Z_o_n_e_G_N_L* Status of each clone: Gain is coded by 1, Loss by -1,
             Amplicon by 2, deletion by -10  and Normal by 0.


*BkpInfo*: a data.frame sum up the information for each breakpoint:

        *_C_h_r_o_m_o_s_o_m_e* Chromosome name.

        *_S_m_o_o_t_h_i_n_g* Smoothing value for the breakpoint.

        *_G_a_p* absolute value of the gap between the smoothing values of
             the two consecutive regions.

        *_S_i_g_m_a* The estimation of the standard-deviation of the 
             chromosome.

        *_W_e_i_g_h_t* 1 - 'kernelpen'(Gap, type, param=c(d=nbsigma*Sigma))

        *_Z_o_n_e_G_N_L* Status of the level where is the breakpoint.

        *_G_N_L_c_h_a_n_g_e* Takes the value 1 if the ZoneGNL of the two
             consecutive regions are different.

        *_L_o_g_R_a_t_i_o* Test over Reference log-ratio.   


*NormalRef*: If 'genomestep=TRUE' and 'normalrefcenter=FALSE', then
          NormalRef is the median of the cluster which has been used to
          set the normal reference during the "clustering throughout
          the genome" step. Otherwise NormalRef is 0.

_N_o_t_e:

     People interested in tools dealing with array CGH analysis can
     visit our web-page <URL: http://bioinfo.curie.fr>.

_A_u_t_h_o_r(_s):

     Philippe Hup, glad@curie.fr.

_S_e_e _A_l_s_o:

     'glad'.

_E_x_a_m_p_l_e_s:

     data(snijders)
     gm13330$Clone <- gm13330$BAC
     profileCGH <- as.profileCGH(gm13330)

     ###########################################################
     ###
     ###  daglad function
     ###
     ###########################################################

     res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
                   smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
                   qlambda=0.999,  bandwidth=10, base=FALSE, round=1.5,
                   lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=5,
                   method="centroid", nmin=1, nmax=8,
                   amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
                   MinBkpWeight=0.35, CheckBkpPos=TRUE)

     ### Genomic profile on the whole genome
     plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
     main="Breakpoints detection: DAGLAD analysis")


     ###Genomic profile for chromosome 1
     plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
     Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis")

     ### The standard-deviation of LogRatio are:
     res$SigmaC

     ### The list of breakpoints is:
     res$BkpInfo

