boothopach              package:hopach              R Documentation

_f_u_n_c_t_i_o_n_s _t_o _p_e_r_f_o_r_m _n_o_n-_p_a_r_a_m_e_t_r_i_c _b_o_o_t_s_t_r_a_p _r_e_s_a_m_p_l_i_n_g _o_f _h_o_p_a_c_h _c_l_u_s_t_e_r_i_n_g _r_e_s_u_l_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'boothopach' takes gene expression data and
     corresponding 'hopach' gene clustering output and performs
     non-parametric bootstrap resampling. The medoid genes (cluster
     profiles) from the original 'hopach' clustering result are fixed,
     and in each bootstrap resampled data set, each gene is assigned to
     the closest medoid. The proportion of bootstrap samples in which
     each gene appears in each cluster is an estimate of the gene's
     membership in each cluster. These membership probabilities can be
     viewed as a "fuzzy" clustering result. The function 'bootmedoids'
     take medoids and a distance function, rather than a hopach object,
     as input.

_U_s_a_g_e:

     boothopach(data, hopachobj, B = 1000, I, hopachlabels = FALSE)

     bootmedoids(data, medoids, d = "cosangle", B = 1000, I)

_A_r_g_u_m_e_n_t_s:

    data: data matrix, data frame or exprSet of gene expression
          measurements. Each column corresponds to an array, and each
          row corresponds to a gene. All values must be numeric.
          Missing values are ignored.

hopachobj: output of the 'hopach' function.

       B: number of bootstrap resampled data sets.

       I: number of bootstrap resampled data sets (deprecated,
          retaining til v1.2 for back compatibility).

hopachlabels: indicator of whether to use the hopach cluster labels 
          'hopachobj$clustering$labels' for the row names (TRUE) versus
          the  numbers 0 to 'k-1', where 'k' is the number of clusters
          (FALSE).

 medoids: row indices of 'data' for the cluster medoids.

       d: character string specifying the metric to be used for
          calculating  dissimilarities between vectors. The currently
          available options are  "cosangle" (cosine angle or uncentered
          correlation distance), "abscosangle"  (absolute cosine angle
          or absolute uncentered correlation distance),  "euclid"
          (Euclidean distance), "abseuclid" (absolute Euclidean
          distance), "cor" (correlation distance), and "abscor"
          (absolute correlation distance). Advanced users can write
          their own distance functions and add these.

_D_e_t_a_i_l_s:

     The function 'boothopach' requires only data and the corresponding
     output from the HOPACH clustering algorithm produced by the
     'hopach' function. The function 'bootmedoids' is designed to work
     for any clustering result; the user imputs data, medoid row
     indices, and the distance metric. The supplied distance metrics
     are the same as for the 'distancematrix' function. Each
     non-parametric bootstrap resampled data set consists of resampling
     the 'n' columns of 'data' with replacement 'n' times. The distance
     between each element and each of the medoid elements is computed
     using 'd' for each bootstrap data set, and every element is
     assigned (for that resampled data set) to the cluster whose medoid
     is closest. These bootstrap cluster assignments are tabulated over
     all 'I' bootstrap data sets.

_V_a_l_u_e:

     A matrix of bootstrap estimated cluster membership probabilities,
     which sum to 1 (over the clusters) for each element being
     clustered. This matrix has one row for each element being
     clustered and one column for each of the original clusters (one
     cluster for each medoid). The value in row 'j' and column 'i' is
     the proportion of the I bootstrap resampled data sets that element
     'j' appeared in cluster 'i' (i.e. was closest to medoid 'i').

_A_u_t_h_o_r(_s):

     Katherine S. Pollard <kpollard@soe.ucsc.edu> and Mark J. van der
     Laan <laan@stat.berkeley.edu>

_R_e_f_e_r_e_n_c_e_s:

     van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid
     hierarchical clustering with visualization and the bootstrap.
     Journal of Statistical Planning and Inference, 2003, 117, pp.
     275-303.

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Pape
     rs/hopach.pdf>

     <URL: http://www.bepress.com/ucbbiostat/paper107/>

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Pape
     rs/jsmpaper.pdf>

     Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An
     Introduction to Cluster Analysis. Wiley, New York.

_S_e_e _A_l_s_o:

     'distancematrix', 'hopach'

_E_x_a_m_p_l_e_s:

     #25 variables from two groups with 3 observations per variable
     mydata<-rbind(cbind(rnorm(10,0,0.5),rnorm(10,0,0.5),rnorm(10,0,0.5)),cbind(rnorm(15,5,0.5),rnorm(15,5,0.5),rnorm(15,5,0.5)))
     dimnames(mydata)<-list(paste("Var",1:25,sep=""),paste("Exp",1:3,sep=""))
     mydist<-distancematrix(mydata,d="cosangle") #compute the distance matrix.

     #clusters and final tree
     clustresult<-hopach(mydata,dmat=mydist)

     #bootstrap resampling
     myobj<-boothopach(mydata,clustresult)
     table(apply(myobj,1,sum)) # all 1
     myobj[clustresult$clust$medoids,] # identity matrix

