xcluster                 package:ctc                 R Documentation

_H_i_e_r_a_r_c_h_i_c_a_l _c_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Performs a hierarchical cluster analysis on a set of
     dissimilarities (this function launch an external program:
     Xcluster).

_U_s_a_g_e:

     xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")

_A_r_g_u_m_e_n_t_s:

    data: a matrix (or data frame) which provides the data to analyze

distance: The distance measure used with _Xcluster_. This must be one
          of '"euclidean"', '"pearson"' or '"notcenteredpearson"'. Any
          unambiguous substring can be given.

   clean: a logical value indicating whether you want the true
          distances ('clean=FALSE'), or you want a clean dendrogram

tmp.in, tmp.out: temporary files for Xcluster

_D_e_t_a_i_l_s:

     Available distance measures are (written for two vectors x and y): 

        *  Euclidean: Usual square distance between the two vectors (2
           norm).

        *  Pearson: 1 - cor(x,y)

        *  Pearson not centered: 1 - [ sum x_i y_i ] / sqrt[ sum x_i^2
           * sum y_i^2 ] 

     Xcluster does not use usual agglomerative methods (single,
     average, complete), but    compute  the distance between each 
     groups' barycenter  for the distance between two groups.

     This have a problem for this kind of data:

       A  0    0
       B  0    1
       C  0.9  0.5

     Ie: a triangular in {\bf R}$^2$, the distance between A and B is
     larger than the distance between the group A,B and C (with
     euclidean distance).

     For that case it can be useful to use 'clean=TRUE' and that mean
     that you must not consider A and B as a group without C.

_V_a_l_u_e:

     An object of class *hclust* which describes the tree produced by
     the clustering process. The object is a list with components:

   merge: an n-1 by 2 matrix. Row i of 'merge' describes the merging of
          clusters at step i of the clustering. If an element j in the
          row is negative, then observation -j was merged at this
          stage. If j is positive then the merge was with the cluster
          formed at the (earlier) stage j of the algorithm. Thus
          negative entries in 'merge' indicate agglomerations of
          singletons, and positive entries indicate agglomerations of
          non-singletons.

  height: a set of n-1 non-decreasing real values. The clustering
          _height_: that is, the value of the criterion associated with
          the clustering 'method' for the particular agglomeration.

   order: a vector giving the permutation of the original observations
          suitable for plotting, in the sense that a cluster plot using
          this ordering and matrix 'merge' will not have crossings of
          the branches.

  labels: labels for each of the objects being clustered.

    call: the call which produced the result.

  method: the cluster method that has been used.

dist.method: the distance that has been used to create 'd' (only
          returned if the distance object has a '"method"' attribute).

_N_o_t_e:

     _Xcluster_ is a C program made by _Gavin Sherlock_ that performs
     hierarchical clustering, K-means and SOM. 

     _Xcluster_ is copyrighted.  To get or have information about
     _Xcluster_: <URL:
     http://genome-www.stanford.edu/~sherlock/cluster.html>

_A_u_t_h_o_r(_s):

     Antoine Lucas, <URL:
     http://mulcyber.toulouse.inra.fr/projects/amap/>

_R_e_f_e_r_e_n_c_e_s:

     Antoine Lucas and Sylvain Jasson, _Using amap and ctc Packages for
     Huge Clustering_, R News, 2006, vol 6, issue 5 pages 58-60.

_S_e_e _A_l_s_o:

     'r2xcluster', 'xcluster2r','hclust', 'hcluster'

_E_x_a_m_p_l_e_s:

     #    Create data
     .Random.seed <- c(1,  416884367 ,1051235439)
     m <- matrix(rep(1,3*24),ncol=3)  
     m[9:16,3] <- 3 ; m[17:24,] <- 3    #create 3 groups
     m <- m+rnorm(24*3,0,0.5)           #add noise
     m <- floor(10*m)/10                #just one digits

     # And once you have Xcluster program:
     #
     #h <- xcluster(m)
     #
     #plot(h) 

