pid                package:Biostrings                R Documentation

_P_e_r_c_e_n_t _S_e_q_u_e_n_c_e _I_d_e_n_t_i_t_y

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the percent sequence identity for a pairwise sequence
     alignment.

_U_s_a_g_e:

     pid(x, type="PID1")

_A_r_g_u_m_e_n_t_s:

       x: a 'PairwiseAlignedXStringSet' object.

    type: one of percent sequence identity. One of '"PID1"', '"PID2"',
          '"PID3"', and '"PID4"'. See Details for more information.

_D_e_t_a_i_l_s:

     Since there is no universal definition of percent sequence
     identity, the 'pid' function calculates this statistic in the
     following types:

     '"_P_I_D_1"': 100 * (identical positions) / (aligned positions +
          internal gap positions)

     '"_P_I_D_2"': 100 * (identical positions) / (aligned positions)

     '"_P_I_D_3"': 100 * (identical positions) / (length shorter sequence)

     '"_P_I_D_4"': 100 * (identical positions) / (average length of the two
          sequences)


_V_a_l_u_e:

     A numeric vector containing the specified sequence identity
     measures.

_A_u_t_h_o_r(_s):

     P. Aboyoun

_R_e_f_e_r_e_n_c_e_s:

     A. May, Percent Sequence Identity: The Need to Be Explicit,
     Structure 2004, 12(5):737.

     G. Raghava and G. Barton, Quantification of the variation in
     percentage identity for protein sequence alignments, BMC
     Bioinformatics 2006, 7:415.

_S_e_e _A_l_s_o:

     pairwiseAlignment, PairwiseAlignedXStringSet-class, match-utils

_E_x_a_m_p_l_e_s:

       s1 <- DNAString("AGTATAGATGATAGAT")
       s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

       palign1 <- pairwiseAlignment(s1, s2)
       palign1
       pid(palign1)

       palign2 <-
         pairwiseAlignment(s1, s2,
           substitutionMatrix =
           nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
       palign2
       pid(palign2, type = "PID4")

