khan                  package:made4                  R Documentation

_M_i_c_r_o_a_r_r_a_y _g_e_n_e _e_x_p_r_e_s_s_i_o_n _d_a_t_a_s_e_t _f_r_o_m _K_h_a_n _e_t _a_l., _2_0_0_1. _S_u_b_s_e_t _o_f _3_0_6 _g_e_n_e_s.

_D_e_s_c_r_i_p_t_i_o_n:

     Khan contains gene expression profiles of four types of small
     round blue cell tumours of childhood (SRBCT) published by Khan et
     al. (2001). It also contains further gene annotation retrieved
     from SOURCE at <URL: http://source.stanford.edu/>.

_U_s_a_g_e:

     data(khan)

_F_o_r_m_a_t:

     Khan is dataset containing the following:

   $_t_r_a_i_n: 'data.frame' of 306 rows and 64 columns.  The training
        dataset of 64 arrays and 306 gene expression values

   $_t_e_s_t: 'data.frame', of 306 rows and 25 columns.  The test dataset
        of 25 arrays and 306 genes expression values

   $_g_e_n_e._l_a_b_e_l_s._i_m_a_g_e_s_I_D: 'vector' of 306 Image clone identifiers
        corresponding to the rownames of $train and $test.

   $_t_r_a_i_n._c_l_a_s_s_e_s: 'factor' with 4 levels "EWS", "BL-NHL", "NB" and
        "RMS", which correspond to the four groups in  the $train
        dataset

   $_t_e_s_t._c_l_a_s_s_e_s: 'factor' with 5 levels "EWS", "BL-NHL", "NB", "RMS"
        and "Norm" which correspond to the five  groups in the $test
        dataset

   $_a_n_n_o_t_a_t_i_o_n: 'data.frame' of 306 rows and 8 columns.  This table
        contains further gene annotation retrieved from SOURCE  <URL:
        http://SOURCE.stanford.edu> in May 2004.  For each of the 306
        genes,  it contains: 

      $_C_l_o_n_e_I_D Image Clone ID

      $_U_G_C_l_u_s_t_e_r The Unigene cluster to which the gene is assigned

      $_S_y_m_b_o_l The HUGO gene symbol

      $_L_L_I_D The locus ID

      $_U_G_R_e_p_A_c_c Nucleotide sequence accession number

      $_L_L_R_e_p_P_r_o_t_A_c_c Protein sequence accession number

      $_C_h_r_o_m_o_s_o_m_e chromosome location

      $_C_y_t_o_b_a_n_d cytoband location 


_D_e_t_a_i_l_s:

     Khan et al., 2001 used cDNA microarrays containing 6567 clones of
     which 3789 were known genes and 2778 were ESTs to study the
     expression of genes in of four types of small round blue cell
     tumours of childhood (SRBCT).   These were neuroblastoma (NB),
     rhabdomyosarcoma (RMS), Burkitt lymphoma, a  subset of non-Hodgkin
     lymphoma (BL), and the Ewing family of tumours (EWS). Gene
     expression profiles from both tumour biopsy and cell line  samples
     were obtained and are contained in this dataset. The dataset
     downloaded  from the website contained the filtered dataset of
     2308 gene expression profiles as described by Khan et al., 2001. 
     This dataset is available from the <URL:
     http://bioinf.ucd.ie/people/aedin/R/>.

     In order to reduce the size of the MADE4 package, and produce
     small example datasets, the top 50 genes from the ends of 3 axes
     following 'bga' were selected. This produced a reduced datasets of
     306 genes.

_S_o_u_r_c_e:

     'khan' contains a filtered data of 2308 gene expression profiles
     as published and provided by Khan et al. (2001) on the
     supplementary  web site to their publication  <URL:
     http://research.nhgri.nih.gov/microarray/Supplement/>.

_R_e_f_e_r_e_n_c_e_s:

     Culhane AC, et al., 2002 Between-group analysis of microarray
     data. Bioinformatics. 18(12):1600-8.

     Khan,J., Wei,J.S., Ringner,M., Saal,L.H., Ladanyi,M.,
     Westermann,F., Berthold,F., Schwab,M., Antonescu,C.R., Peterson,C.
     et al. (2001) Classification and diagnostic  prediction of cancers
     using gene expression profiling and artificial neural networks. 
     Nat. Med., 7, 673-679.

_E_x_a_m_p_l_e_s:

     data(khan)
     summary(khan)

