disscosangle             package:hopach             R Documentation

_F_u_n_c_t_i_o_n_s _t_o _c_o_m_p_u_t_e _p_a_i_r-_w_i_s_e _d_i_s_t_a_n_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Given a matrix 'X', these functions compute the 'nrow(X)' by
     'nrow{X}' matrix of pair-wise distances between all variables
     (rows) in 'X', across all observations (columns) of 'X'. Each
     function uses a different distance metric, i.e. definition of what
     it means for two variables to be similar.

_U_s_a_g_e:

     disscosangle(X, na.rm = TRUE)

     disseuclid(X, na.rm = TRUE)

     disscor(X, na.rm = TRUE)

     dissabscosangle(X, na.rm = TRUE)

     dissabseuclid(X, na.rm = TRUE)

     dissabscor(X, na.rm = TRUE)

     vdisscosangle(X, y, na.rm = TRUE)

     vdisseuclid(X, y, na.rm = TRUE)

     vdisscor(X, y, na.rm = TRUE)

     vdissabscosangle(X, y, na.rm = TRUE)

     vdissabseuclid(X, y, na.rm = TRUE)

     vdissabscor(X, y, na.rm = TRUE)

_A_r_g_u_m_e_n_t_s:

       X: A numeric data matrix. Each column corresponds to an
          observation, and each row corresponds to a variable. In the
          gene expression context, observations are arrays and
          variables are genes. All values must be numeric. Missing
          values are ignored.

   na.rm: Indicator of whether to remove missing values (i.e. only
          compute distance over non-missing observations).

       y: A numeric data vector of length 'ncol(X)'.

_D_e_t_a_i_l_s:

     Different choices of distance metric are discussed in the
     references. Briefly, Euclidean distance ('disseuclid') defines two
     variables to be close if they are similar in magnitude across
     observations. Correlation distance ('disscor'), in contrast,
     defines similarity to mean having the same pattern, but not
     necessarily the same magnitude. Cosine-angle ('disscosangle')
     distance is a correlation distance that also accounts for
     magnitude. Cosine-angle distance is also known as uncentered
     correlation distance. The distance metrics with 'abs' in their
     names are absolute versions of each metric; the absolute value is
     applied to the data before computing the distance.

     For cosine-angle and correlation (and their absolute values) these
     functions return the square root of the usual distance, for
     example as computed by the dist() function. Typically, this
     transformation makes the dissimilarity correspond more closely
     with the norm. Also, the  Euclidean distance is standardized by
     the sample size (n), so that the  values produced are equal to the
     values from the dist() function divided  by sqrt(n).

_V_a_l_u_e:

     A numeric 'nrow(X)' by 'nrow{X}' matrix of pair-wise distances
     between all variables (rows) in 'X'. For the vector versions (e.g.
     'vdisscosangle'), a numeric vector of 'nrow(X)' pair-wise
     distances between each variable (row) in 'X' and the vector 'y'.

_A_u_t_h_o_r(_s):

     Katherine S. Pollard <kpollard@soe.ucsc.edu> and Mark J. van der
     Laan <laan@stat.berkeley.edu>

_R_e_f_e_r_e_n_c_e_s:

     van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid
     hierarchical clustering with visualization and the bootstrap.
     Journal of Statistical Planning and Inference, 2003, 117, pp.
     275-303.

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf>

     <URL: http://www.bepress.com/ucbbiostat/paper107/>

     <URL:
     http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/jsmpaper.pdf>

_S_e_e _A_l_s_o:

     'distancematrix'

_E_x_a_m_p_l_e_s:

     data<-matrix(rnorm(50),nr=5)
     disscosangle(data)

