bga                  package:made4                  R Documentation

_B_e_t_w_e_e_n _g_r_o_u_p _a_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Discrimination of samples using between group analysis as
     described by  Culhane et al., 2002.

_U_s_a_g_e:

     bga(dataset, classvec, type = "coa", ...)
     ## S3 method for class 'bga':
     plot(x, axis1=1, axis2=2, arraycol=NULL, genecol="gray25", nlab=10, 
              genelabels= NULL, ...)

_A_r_g_u_m_e_n_t_s:

 dataset: Training dataset. A 'matrix', 'data.frame',  'ExpressionSet'
          or 'marrayRaw'.   If the input is gene expression data in a
          'matrix' or 'data.frame'. The  rows and columns are expected
          to contain the variables (genes) and cases (array samples) 
          respectively. 

classvec: A 'factor' or 'vector' which describes the classes in the
          training dataset.

    type: Character, "coa", "pca" or "nsc" indicating which data
          transformation is required. The default value is type="coa".

       x: An object of class 'bga'.  The output from 'bga' or 
          'bga.suppl'. It contains the projection coordinates from
          'bga',  the $ls, $co or $li coordinates to be plotted.

arraycol, genecol: Character, colour of points on plot. If arraycol is
          NULL,  arraycol will obtain a set of contrasting colours
          using 'getcol', for each classes  of cases (microarray
          samples) on the array (case) plot.  genecol is the colour of
          the  points for each variable (genes) on gene plot.

    nlab: Numeric. An integer indicating the number of variables
          (genes) at the end of axes to be labelled, on the gene plot.

   axis1: Integer, the column number for the x-axis. The default is 1.

   axis2: Integer, the column number for the y-axis, The default is 2.

genelabels: A vector of variables labels, if 'genelabels=NULL' the
          row.names  of input matrix 'dataset' will be used.

     ...: further arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     'bga' performs a between group analysis on the input dataset. This
     function calls 'between'.  The input format of the dataset  is
     verified using 'array2ade4'. 

     Between group analysis is a supervised method for sample
     discrimination and class prediction.  BGA is carried out by
     ordinating groups (sets of grouped microarray samples), that is, 
     groups of samples are projected into a reduced dimensional space.
     This is most easily  done using PCA or COA, of the group means. 
     The choice of PCA, COA is defined by the parameter 'type'.

     The user must define microarray sample groupings in advance. These
     groupings are defined using  the input 'classvec', which is a
     'factor' or 'vector'. 

     *Cross-validation and testing of bga results:*

     bga results should be validated using one leave out jack-knife
     cross-validation using  'bga.jackknife' and by projecting a blind
     test datasets onto the bga axes  using 'suppl'.   'bga' and
     'suppl' are combined in 'bga.suppl'  which requires input of both
     a training and test dataset. It is important to ensure that the
     selection of cases for a training and test set are not biased, and
     generally many cross-validations should be performed.  The
     function 'randomiser' can be used to randomise the selection of
     training and test samples.

     *Plotting and visualising bga results:*

     _2D plots:_ Use 'plot.bga' to plot results from 'bga'. plot.bga
     calls the functions  's.var' and 's.groups' to draw an xy plot of
     cases ($ls).  's.var' and 's.groups' are modifications of the ADE4
     graphing functions  's.label' and 's.class'.   'plotgenes', is
     used to draw an xy plot of the variables (genes). 

     _3D plots:_ 3D graphs can be generated using 'do3D' and 'html3D'. 
     'html3D' produces a web page in which a 3D plot can be
     interactively rotated, zoomed, and in which classes or groups of
     cases can be easily highlighted. 

     _1D plots, show one axis only:_ 1D graphs can be plotted using
     'between.graph' and 'graph1D'. 'between.graph' is used for
     plotting the cases, and required both the co-ordinates of the
     cases ($ls) and their centroids ($li). It accepts an object 'bga'.
       'graph1D' can be used to plot either cases (microarrays) or
     variables (genes) and only requires a vector of coordinates.

     *Analysis of the distribution of variance among axes:*

     It is important to know which cases (microarray samples) are
     discriminated by the axes.   The number of axes or  principal
     components from a 'bga' will equal 'the number of classes - 1', 
     that is length(levels(classvec))-1.

     The distribution of variance among axes is described in the
     eigenvalues ($eig) of the 'bga' analysis.  These can be visualised
     using a scree plot, using 'scatterutil.eigen' as it done in
     'plot.bga'.   It is also useful to visualise the principal
     components from a using a 'bga' or principal components analysis 
     'dudi.pca', or correspondence analysis 'dudi.coa' using a heatmap.
     In MADE4 the function 'heatplot' will plot a heatmap with nicer
     default colours.

     *Extracting list of top variables (genes):*

     Use 'topgenes'  to get list of variables or cases at the ends of
     axes.  It will return a list of the top n variables (by default
     n=5) at the positive, negative or both ends of an axes.  
     'sumstats' can be used to return the angle (slope) and distance
     from the origin of a list of coordinates.

     For more details see Culhane et al., 2002 and <URL:
     http://bioinf.ucd.ie/research/BGA>.

_V_a_l_u_e:

     A list with a class 'bga' containing:

     ord: Results of initial ordination. A list of class "dudi" (see
          'dudi' )

     bet: Results of between group analysis. A list of class "dudi" 
          (see 'dudi'), "between" (see 'between')

     fac: The input classvec, the 'factor' or 'vector' which described
          the classes in the input dataset

_A_u_t_h_o_r(_s):

     Aedin Culhane

_R_e_f_e_r_e_n_c_e_s:

     Culhane AC, et al., 2002 Between-group analysis of microarray
     data. Bioinformatics. 18(12):1600-8.

_S_e_e _A_l_s_o:

     See Also  'bga', 'suppl', 'suppl.bga', 'between', 'bga.jackknife'

_E_x_a_m_p_l_e_s:

     data(khan)

     if (require(ade4, quiet = TRUE)) {
       khan.bga<-bga(khan$train, classvec=khan$train.classes)  
       }

     khan.bga
     plot(khan.bga, genelabels=khan$annotation$Symbol)

     # Provide a view of the principal components (axes) of the bga  
     heatplot(khan.bga$bet$ls, dend="none")   

