distancematrix            package:hopach            R Documentation

_f_u_n_c_t_i_o_n_s _t_o _c_o_m_p_u_t_e _p_a_i_r _w_i_s_e _d_i_s_t_a_n_c_e_s _b_e_t_w_e_e_n _v_e_c_t_o_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'distancematrix' is applied to a matrix of data to
     compute the pair wise distances between all rows of the matrix. In
     hopach versions >= 2.0.0 these distance functions are calculated
     in C, rather than R, to improve run time performance. function
     'distancevector' is applied to a matrix and a vector to compute
     the pair wise distances between each row of the matrix and  the
     vector. Both functions allow different choices of distance metric.
     The functions 'dissmatrix' and 'dissvector' allow one to  convert
     between a distance matrix and a vector of the upper triangle. The 
     function 'vectmatrix' is used internally.

_U_s_a_g_e:

     distancematrix(X, d, na.rm=TRUE)

     distancevector(X, y, d, na.rm=TRUE)

     dissmatrix(v)

     dissvector(M)

     vectmatrix(index, p)

_A_r_g_u_m_e_n_t_s:

       X: a numeric matrix. Missing values will be ignored if
          na.rm=TRUE.

       y: a numeric vector, possibly a row of X. Missing values will be
          ignoredif na.rm=TRUE.

   na.rm: an indicator of whether or not to remove missing values. If
          na.rm=TRUE (default), then distances are computed over all
          pairwise non-missing values. Else missing values are
          propagated through the distance computation.

       d: character string specifying the metric to be used for
          calculating  dissimilarities between vectors. The currently
          available options are  "cosangle" (cosine angle or uncentered
          correlation distance), "abscosangle"  (absolute cosine angle
          or absolute uncentered correlation distance),  "euclid"
          (Euclidean distance), "abseuclid" (absolute Euclidean
          distance), "cor" (correlation distance), and "abscor"
          (absolute correlation distance). Advanced users can write
          their own distance functions and add these.

       M: a symmetric matrix of pair wise distances.

       v: a vector of pair wise distances corresponding to the upper
          triangle of a distance matrix, stored by rows.

   index: index in a distance vector, like that returned by
          'dissvector'.

       p: number of elements, e.g. the number of rows in a distance
          matrix.

_D_e_t_a_i_l_s:

     In hopach versions <2.0.0, these functions returned the square
     root of  the usual distance for 'd="cosangle"', 'd="abscosangle"',
      'd="cor"', and 'd="abscor"'. Typically, this transformation makes
     the dissimilarity correspond more closely with the norm. In order
     to  agree with the 'dist' function, the square root is no longer
     used  in versions >=2.0.0.

_V_a_l_u_e:

     For versions >= 2.0.0 'distancematrix', a 'hdist'  object of of
     all pair wise distances between the rows of the data matrix 'X',
     i.e. the value of 'hdist[i,j]' is the distance between rows 'i'
     and 'j' of 'X', as defined by 'd'.  A 'hdist' object is an S4
     class containing  four slots: 

    Data: representing the lower triangle of the symmetric distance
          matrix.

    Size: the number of objects (i.e. rows of the data  matrix). 

  Labels: labels for the objects, usually the numbers 1 to  Size. 

    Call: the distance used in the call to  'distancematrix'.  


     A hdist object and can be converted to a matrix using
     'as.matrix(hdist)'. (See 'hdist' for more details.)

     For 'distancevector', a vector of all pair wise distances between
     rows of 'X' and the vector 'y'. Entry 'j' is the distance between
     row 'j' of 'X' and the vector 'y'.

     For 'distancevector', a vector of all pair wise distances between
     rows of 'X' and the vector 'y'. Entry 'j' is the distance between
     row 'j' of 'X' and the vector 'y'.

     For 'dissmatrix', the corresponding distance vector. For 
     'dissvector', the corresponding distance matrix. If 'M' has 'p'
     rows (and columns), then 'v' is length 'p*(p-1)/2'.

     For 'vectmatrix', the indices of the row and column of a distance
     matrix corresponding to entry 'index' in the corresponding 
     distance vector.

_W_a_r_n_i_n_g:

     The  correlation and absolute correlation distance functions call
     the 'cor' function, and will therefore fail if there are missing
     values in the data and na.rm!=TRUE.

_A_u_t_h_o_r(_s):

     Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van
     der Laan <laan@stat.berkeley.edu>, with Greg Walll

_R_e_f_e_r_e_n_c_e_s:

     van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid
     hierarchical clustering with visualization and the bootstrap.
     Journal of Statistical Planning and Inference, 2003, 117, pp.
     275-303.

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf>

_S_e_e _A_l_s_o:

     'hopach', 'correlationordering', 'disscosangle'

_E_x_a_m_p_l_e_s:

     mydata<-matrix(rnorm(50),nrow=10)
     deuclid<-distancematrix(mydata,d="euclid")
     # old method vdeuclid<-dissvector(deuclid)
     vdeuclid<-deuclid@Data
     ddaisy<-daisy(mydata)
     vdeuclid
     ddaisy/sqrt(length(mydata[1,]))

     d1<-distancematrix(mydata,d="abscosangle")
     d2<-distancevector(mydata,mydata[1,],d="abscosangle")
     d1[1,]
     d2 #equal to d1[1,]

     # old method d3<-dissvector(d1)
     d3<-d1@Data
     pair<-vectmatrix(5,10)
     d1[pair[1],pair[2]]
     d3[5]

