correlationordering          package:hopach          R Documentation

_f_u_n_c_t_i_o_n _t_o _c_o_m_p_u_t_e _e_m_p_i_r_i_c_a_l _c_o_r_r_e_l_a_t_i_o_n _b_e_t_w_e_e_n _d_i_s_t_a_n_c_e _i_n _a _l_i_s_t _a_n_d _d_i_s_t_a_n_c_e _b_y _a _m_e_t_r_i_c

_D_e_s_c_r_i_p_t_i_o_n:

     Given a matrix of pair wise distances based on a choice of
     distance metric, 'correlationordering' computes the empirical
     correlation (over all pairs of elements) between the distance
     apart in the rows/columns of the matrix and the distance according
     to the metric. Correlation ordering will be high if elements close
     to each other in the matrix have small pair wise distances. If the
     rows/columns of the distance matrix are ordered according to a
     clustering of the elements, then correlation ordering should be
     large compared to a matrix with randomly ordered rows/columns.

_U_s_a_g_e:

     correlationordering(dist)

     improveordering(dist,echo=FALSE)

_A_r_g_u_m_e_n_t_s:

    dist: matrix of all pair wise distances between a set of 'p'
          elements,  as produced, for example, by the 'distancematrix'
          function.  The value in row 'j' and column 'i' is the
          distance between element 'i' and element 'j'. The matrix must
          be symmetric. The ordering of the rows/ columns is compared
          to the values in the matrix.

    echo: indicator of whether the value of correlation ordering before
          and after rearranging the ordering should be printed.

_D_e_t_a_i_l_s:

     Correlation ordering is defined as the empirical correlation
     between distance in a list and distance according to some other
     metric. The value in row 'i' and column 'j' of 'dist' is compared
     to 'j-i'. The function 'correlationordering' computes the
     correlation ordering for a matrix 'dist', whereas the function
     'improveordering' swaps the ordering of elements in 'dist' until
     doing so no longer improves correlation ordering. The algorithm
     for 'improveordering' is not optimized, so that the function can
     be quite slow for more than 50 elements. These functions are used
     by the 'hopach' clustering function to sensibly order the clusters
     in the first level of the hierarchical tree, and can also be used
     to order elements within clusters when the number of elements is
     not too large.

_V_a_l_u_e:

     For 'correlationordering', a number between -1 and 1, as returned
     by the 'cor' function, equal to the correlation ordering for the
     matrix 'dist'.

     For 'improveordering', a vector of length 'p' containing the row
     indices for the new ordering of the rows/columns of 'dist', so
     that dist[improveordering(dist)] now has higher correlation
     ordering.

_W_a_r_n_i_n_g:

     The function 'improveordering' can be very slow for more than
     about 50 elements. The method employed is a greedy, step-wise
     algorithm, in which sequentially swaps all pairs of elements and
     accepts any swap that improves correlation ordering.

_A_u_t_h_o_r(_s):

     Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van
     der Laan <laan@stat.berkeley.edu>

_R_e_f_e_r_e_n_c_e_s:

     van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid
     hierarchical clustering with visualization and the bootstrap.
     Journal of Statistical Planning and Inference, 2003, 117, pp.
     275-303.

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf>

_S_e_e _A_l_s_o:

     'distancematrix', 'hopach'

_E_x_a_m_p_l_e_s:

     mydata<-matrix(rnorm(50),nrow=10)
     mydist<-distancematrix(mydata,d="euclid")
     image(as.matrix(mydist))
     correlationordering(mydist)
     neword<-improveordering(mydist,echo=TRUE)
     correlationordering(mydist[neword,neword])
     image(as.matrix(mydist[neword,neword]))

