SIM-package               package:SIM               R Documentation

_S_t_a_t_i_s_t_i_c_a_l _I_n_t_e_g_r_a_t_i_o_n _o_f _M_i_c_r_o_a_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     SIM is a statistical model to identify copy number changes that
     affect the expression  of genes within the same chromosomal
     region. Copy number is considered as the dependent  variable and
     expression as the independent variable. Copy number alterations
     may span  many expression probes and affect them in a possibly
     subtle but consistent way. Therefore, we test whether copy number
     is associated with a set of expression levels  within a chromosome
     arm (or mimimal common region) in a random-effect model.
     Association scores for individual  expression levels (z-scores)
     are also calculated. For more information on the  random-effect
     model, see '?globaltest'.

     Each sample should be profiled both on a copy number and on an
     expression array. The array  platforms used for DNA and RNA
     analysis may be different as long as the probes have mapped  to
     the genome. RESOURCERER can be used to search chromosome and
     basepair location for  expression microarray probes
     '(http://compbio.dfci.harvard.edu/tgi/cgi-bin/magic/r1.pl)'.  See
     'RESOURCERER.annotation.to.ID' on how to insert this information
     as annotation columns. Alternatively, the chromosome, basepair
     locations and gene symbol can be extracted from  AnnotationData
     packages available in Bioconductor or generated using the
     AnnBuilder package.

     When copy number data is run as dependent variable, we use
     'method.adjust="BY"' for  multiple testing correction. This method
     accounts for dependence between measurements and  is more
     conservative than "BH". For details on the multiple testing
     correction methods see  ?p.adjust. We have experienced that a
     rather low stringency cut-off on the BY-values of  20% allows the
     detection of associations for data with a low number of samples or
     a low  frequency of abberations. False positives are rarely
     observed.

     Make sure that the array probes are mapped to the same builds of
     the genome, and that the  'chrom.table' used by the
     'integrated.analysis' is from the same build as well. See
     'sim.update.chrom.table'.

_D_e_t_a_i_l_s:


       Package:  SIM
       Type:     Package
       Version:  1.9.0
       Date:     2008-02-06
       License:  Open

_A_u_t_h_o_r(_s):

     Marten Boetzer, Melle Sieswerda, Renee X. de Menezes 
     R.X.Menezes@lumc.nl

_R_e_f_e_r_e_n_c_e_s:

     R.X. de Menezes, M. Boetzer, M. Sieswerda, G.J.B. van Ommen, J.M.
     Boer Integrated Statistical analysis to identify associations
     between DNA copy number and gene expression in microarray data.
     Submitted.

_S_e_e _A_l_s_o:

     'assemble.data', 'integrated.analysis', 'sim.plot.zscore.heatmap',
      'sim.plot.pvals.on.region', 'sim.plot.pvals.on.genome',
     'tabulate.pvals',  'tabulate.top.dep.features',
     'tabulate.top.indep.features',  'impute.nas.by.surrounding',
     'sim.update.chrom.table'

_E_x_a_m_p_l_e_s:

     #load the datasets and the samples to run the integrated analysis
     data(expr.data)
     data(acgh.data)
     data(samples) 
              
     #assemble the data
     assemble.data(dep.data = acgh.data, indep.data = expr.data,ann.dep = colnames(acgh.data)[1:4], ann.indep = colnames(expr.data)[1:4], dep.id="ID", dep.chr = "CHROMOSOME",dep.pos = "STARTPOS",dep.symb="Symbol",  indep.id="ID",indep.chr = "CHROMOSOME", indep.pos = "STARTPOS", indep.symb="Symbol", overwrite = TRUE,run.name = "chr8")

     #run the integrated analysis
     integrated.analysis(samples = samples, input.regions = 8, adjust=FALSE, zscores=TRUE, method = "auto", run.name = "chr8")

     # use functions to plot the results of the integrated analysis

     #plot the p-values along the genome
     sim.plot.pvals.on.genome(input.regions = 8,adjust.method = "BY",pdf = FALSE, run.name = "chr8")

     #plot the p-values along the regions
     sim.plot.pvals.on.region(input.regions = 8, adjust.method="BY", run.name = "chr8")

     #plot the z-scores in an association heatmap
     sim.plot.zscore.heatmap(input.regions = 8, significance=0.2, z.threshold=3, show.names.dep=TRUE,show.names.indep=TRUE, adjust.method = c("BY"), scale="auto", plot.method = "smooth", pdf = FALSE, run.name = "chr8")

     #tabulate the p-values per region (prints to screen)
     tabulate.pvals(input.regions = 8,adjust.method="BY", bins=c(0.001,0.005,0.01,0.025,0.05,0.075,0.10,0.20,1.0), significance.idx=8, order.by="%", decreasing=TRUE, run.name = "chr8") 

     #get the top dependent features sorted by p-value
     tabulate.top.dep.features(input.regions = 8, adjust.method="BY",run.name = "chr8")

     #get the top independent features sorted by mean z-score
     tabulate.top.indep.features(input.regions = 8,adjust.method="BY", significance=0.2, sort.order='positive', run.name = "chr8")

