proteinProperties        package:yeastExpData        R Documentation

_P_r_o_p_e_r_t_i_e_s _o_f _Y_e_a_s_t _p_r_o_t_e_i_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     A data frame which details 33 properties of proteins in the Yeast
     Genome

_U_s_a_g_e:

     data(proteinProperties)

_F_o_r_m_a_t:

     A data frame with 6718 observations on the following 33 variables.

     '_y_O_R_F' a factor representing yeast ORF names, with levels 'Q0010',
          'Q0017', etc. 

     '_S_G_D_I_D' a factor representing SGD IDs 

     '_m_o_l_w_t' a numeric vector giving Molecular Weight in Daltons

     '_p_i' a numeric vector denoting the theoretical isoelectric
          point(pI), the pH at which the protein carries no net charge

     '_c_a_i' a numeric vector denoting Codon Adaptation Index

     '_l_e_n_g_t_h' a numeric vector denoting length of the protein (number
          of amino acids)

     '_n_t_e_r_m' a factor representing N Term Sequence with levels
          'MAAACIC' 'MAAAPWY', etc.

     '_c_t_e_r_m' a factor representing N Term Sequence with levels
          'AAAAMLL' 'AAADKKT', etc. 

     '_c_o_d_o_n_B_i_a_s' a numeric vector denoting Codon Bias

     The next set of columns, designated by amino acids, is the number
     of times that particular residue appears in the protein sequence. 
     For example, if the ALA column is 2, then the protein contains 2
     alanines. These columns (should) add up to the 'length' column.

     '_A_L_A' a numeric vector

     '_A_R_G' a numeric vector

     '_A_S_N' a numeric vector

     '_A_S_P' a numeric vector

     '_C_Y_S' a numeric vector

     '_G_L_N' a numeric vector

     '_G_L_U' a numeric vector

     '_G_L_Y' a numeric vector

     '_H_I_S' a numeric vector

     '_I_L_E' a numeric vector

     '_L_E_U' a numeric vector

     '_L_Y_S' a numeric vector

     '_M_E_T' a numeric vector

     '_P_H_E' a numeric vector

     '_P_R_O' a numeric vector

     '_S_E_R' a numeric vector

     '_T_H_R' a numeric vector

     '_T_R_P' a numeric vector

     '_T_Y_R' a numeric vector

     '_V_A_L' a numeric vector

     The remaining columns are:

     '_f_o_p' FOP score, a numeric vector, denoting Frequency of Optimal
          Codons

     '_g_r_a_v_y' Gravy score, a numeric vector denoting Hydropathicity of
          Protein

     '_a_r_o_m_a_t_i_c_i_t_y' Aromaticity score, a numeric vector denoting
          Frequency of aromatic amino acids: Phe, Tyr, Trp

     '_t_y_p_e' Feature type, a factor with levels 'ORF|Dubious'
          'ORF|Uncharacterized' 'ORF|Verified'
          'ORF|Verified|silenced_gene' 'pseudogene'
          'transposable_element_gene'

_D_e_t_a_i_l_s:

     This data frame is downloaded directly from SGD. It contains 33
     characteristics for 6714 open reading frames (ORFS). From the SGD
     README:

     "Contains basic protein information about each ORF in SGD. This
     file does not include information on deleted or merged ORFs. Note,
     however, that it includes ORFs of all other classifications
     (Verified, Uncharacterized, and Dubious)."

     For more details see <URL:
     http://www.yeastgenome.org/help/protein_page.html>.

_S_o_u_r_c_e:

     <URL:
     ftp://genome-ftp.stanford.edu/pub/yeast/protein_info/protein_properties.tab>.
       This file is updated weekly (Saturday).  The version used here
     was downloaded on 2006-11-03.

_E_x_a_m_p_l_e_s:

     data(proteinProperties)
     pairs(proteinProperties[, c("molwt", "pi", "cai", "gravy", "aromaticity")],
           pch = ".", col = as.numeric(proteinProperties$type))

