NCI60                 package:made4                 R Documentation

_M_i_c_r_o_a_r_r_a_y _g_e_n_e _e_x_p_r_e_s_s_i_o_n _p_r_o_f_i_l_e_s _o_f _t_h_e _N_C_I _6_0 _c_e_l_l _l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     NCI60 is a dataset of gene expression profiles of 60 National
     Cancer Institute (NCI) cell lines. These 60 human tumour cell
     lines are derived from patients with leukaemia, melanoma,  along
     with, lung, colon, central nervous system, ovarian, renal, breast
     and prostate cancers.  This panel of cell lines have been
     subjected to several different DNA microarray studies using both
     Affymetrix and spotted cDNA array technology. This dataset
     contains subsets from one  cDNA spotted (Ross et al., 2000) and
     one Affymetrix (Staunton et al., 2001) study, and  are
     pre-processed as described by Culhane et al., 2003.

_U_s_a_g_e:

     data(NCI60)

_F_o_r_m_a_t:

     The format is: List of 3

   $_R_o_s_s: 'data.frame' containing 144 rows and 60 columns.  144 gene
        expression log ratio measurements of the NCI60 cell lines.

   $_A_f_f_y: 'data.frame' containing 144 rows and 60 columns.  144
        Affymetrix gene expression average difference measurements of
        the NCI60 cell lines.

   $_c_l_a_s_s_e_s: Data 'matrix' of 60 rows and 2 columns.   The first column
        contains the names of the 60 cell line which were analysed.  
        The second column lists the 9 phenotypes of the cell lines,
        which are  BREAST, CNS, COLON, LEUK, MELAN, NSCLC, OVAR,
        PROSTATE, RENAL.

   $_A_n_n_o_t: Data 'matrix' of 144 rows and 4 columns.   The 144 rows
        contain the 144 genes in the $Ross and $Affy datasets, together
        with their  Unigene IDs, and HUGO Gene Symbols.  The Gene
        Symbols obtained for the $Ross and $Affy datasets differed (see
        note below), hence both are given. The columns of the 'matrix'
        are the IMAGE ID of the clones of the $Ross dataset, the HUGO
        Gene Symbols of these IMAGE clone ID obtained from SOURCE, the
        Affymetrix ID of the $Affy dataset, and the HUGO Gene Symbols
        of these Affymetrix IDs obtained using 'annaffy'. 


_D_e_t_a_i_l_s:

     The datasets were processed as described by Culhane et al., 2003.

     The Ross 'data.frame' contains gene expression profiles of each
     cell lines in the NCI-60 panel,  which were determined using
     spotted cDNA arrays containing 9,703 human cDNAs (Ross et al.,
     2000).  The data were downloaded from The NCI Genomics and
     Bioinformatics Group Datasets resource  <URL:
     http://discover.nci.nih.gov/datasetsNature2000.jsp>. The updated
     version of this dataset  (updated 12/19/01) was retrieved. Data
     were provided as log ratio values. 

     In this study, rows (genes) with greater than 15 and were removed
     from analysis, reducing the dataset to 5643 spot values per cell
     line.  Remaining missing values were imputed using a K nearest
     neighbour method, with 16 neighbours  and a Euclidean distance
     metric (Troyanskaya et al., 2001). The dataset $Ross contains a
     subset of the 144 genes of the 1375 genes set described by Scherf
     et al., 2000.  This datasets is available for download from <URL:
     http://bioinf.ucd.ie/people/aedin/R/>. 

     In order to reduce the size of the example datasets, the Unigene
     ID's for each of the 1375 IMAGE ID's for these genes were obtained
     using SOURCE <URL: http://source.stanford.edu>.  These were
     compared with the Unigene ID's of the 1517 gene subset of the
     $Affy dataset.  144 genes were common between the two datasets and
     these are contained in $Ross.

     The Affy data were derived using high density Hu6800 Affymetrix
     microarrays containing  7129 probe sets (Staunton et al., 2001).
     The dataset was downloaded from the Whitehead Institute  Cancer
     Genomics supplemental data to the paper from Staunton et al.,
     <URL: http://www-genome.wi.mit.edu/mpr/NCI60/>, where the data
     were provided as average difference (perfect match-mismatch)
     values. As described by  Staunton et al.,  an expression value of
     100 units was assigned to all average difference values  less than
     100. Genes whose expression was invariant across all 60 cell lines
     were not considered,  reducing the dataset to 4515 probe sets.
     This dataset NCI60$Affy of 1517 probe set, contains genes  in
     which the minimum change in gene expression across all 60 cell
     lines was greater than 500 average  difference units.  Data were
     logged (base 2) and median centred.  This datasets is available
     for download from <URL: http://bioinf.ucd.ie/people/aedin/R/>. 

     In order to reduce the size of the example datasets, the Unigene
     ID's for each of the 1517 Affymetrix ID of these genes were
     obtained using the function 'aafUniGene' in the 'annaffy'
     Bioconductor package. These 1517 Unigene IDs were compared with
     the Unigene ID's of the 1375 gene subset of the $Ross dataset. 
     144 genes were common between the two datasets and these are
     contained in $Affy.

_S_o_u_r_c_e:

     These pre-processed datasets were available as a supplement to the
     paper:

     Culhane AC, Perriere G, Higgins DG. Cross-platform comparison and
     visualisation of gene expression data  using co-inertia analysis.
     BMC Bioinformatics. 2003 Nov 21;4(1):59. <URL:
     http://www.biomedcentral.com/1471-2105/4/59>

_R_e_f_e_r_e_n_c_e_s:

     Culhane AC, Perriere G, Higgins DG. Cross-platform comparison and
     visualisation of gene expression  data using co-inertia analysis.
     BMC Bioinformatics. 2003 Nov 21;4(1):59.

     Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V,
     Jeffrey SS, Van de Rijn M,  Waltham M, Pergamenschikov A, Lee JC,
     Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D,  Brown
     PO: Systematic variation in gene expression patterns in human
     cancer cell lines.   Nat Genet 2000, 24:227-235

     Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW,
     Reinhold WC, Myers TG,  Andrews DT, Scudiero DA, Eisen MB,
     Sausville EA, Pommier Y, Botstein D, Brown PO,  Weinstein JN: A
     gene expression database for the molecular pharmacology of
     cancer.Nat Genet 2000,  24:236-244.

     Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J,
     Scherf U, Lee JK, Reinhold WO,  Weinstein JN, Mesirov JP, Lander
     ES, Golub TR: Chemosensitivity prediction by transcriptional 
     profiling. Proc Natl Acad Sci U S A 2001, 98:10787-10792. 

     Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani
     R, Botstein D, Altman RB: Missing value estimation methods for DNA
     microarrays. Bioinformatics 2001, 17:520-525.

_E_x_a_m_p_l_e_s:

     data(NCI60)
     summary(NCI60)

