gfp               package:yeastExpData               R Documentation

_Y_e_a_s_t _G_F_P _F_u_s_i_o_n _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This data frame contains data concerning the localization and
     abundance of various yeast proteins.

_U_s_a_g_e:

     data(gfp)

_F_o_r_m_a_t:

     A data frame with 6234 observations on the following 33 variables.

     '_o_r_f_i_d' a numeric vector of identifiers

     '_y_O_R_F' a factor representing yeast ORF names, with levels
          'YAL001C', 'YAL002W', etc.  These are also the row names of
          the data frame.

     '_g_e_n_e__n_a_m_e' a factor representing corresponding yeast gene names,
          with levels 'AAC1', 'AAC3', etc. 

     '_G_F_P__t_a_g_g_e_d' a factor with levels 'not tagged' and 'tagged',
          indicating whether or not the ORF was GFP tagged

     '_G_F_P__v_i_s_u_a_l_i_z_e_d' a factor with levels 'not visualized' and
          'visualized', indicating whether or not GFP fluoresence was
          visualized

     '_T_A_P__v_i_s_u_a_l_i_z_e_d' a factor with levels 'TAP visualized' and 'not
          TAP visualized', indicating success of TAP tag

     '_a_b_u_n_d_a_n_c_e' a numeric vector, giving estimated abundance in units
          of molecules per cell 

     '_e_r_r_o_r' a numeric vector of estimated errors in abundance for a
          subset of proteins, in the same units as 'abundance' (see
          details below)

     '_l_o_c_a_l_i_z_a_t_i_o_n__s_u_m_m_a_r_y' a factor with levels '', 'ER', 'ER to
          Golgi', 'ER,ambiguous', 'ER,ambiguous,bud', etc.  Summarizes
          the information contained in the subsequent columns. 

     The following columns indicate whether or not the protein was
     localized in the specific region of the cell.  A protein can be
     localized in more than one region.

     '_a_m_b_i_g_u_o_u_s' a logical vector

     '_m_i_t_o_c_h_o_n_d_r_i_o_n' a logical vector

     '_v_a_c_u_o_l_e' a logical vector

     '_s_p_i_n_d_l_e__p_o_l_e' a logical vector

     '_c_e_l_l__p_e_r_i_p_h_e_r_y' a logical vector

     '_p_u_n_c_t_a_t_e__c_o_m_p_o_s_i_t_e' a logical vector

     '_v_a_c_u_o_l_a_r__m_e_m_b_r_a_n_e' a logical vector

     '_E_R' a logical vector

     '_n_u_c_l_e_a_r__p_e_r_i_p_h_e_r_y' a logical vector

     '_e_n_d_o_s_o_m_e' a logical vector

     '_b_u_d__n_e_c_k' a logical vector

     '_m_i_c_r_o_t_u_b_u_l_e' a logical vector

     '_G_o_l_g_i' a logical vector

     '_l_a_t_e__G_o_l_g_i' a logical vector

     '_p_e_r_o_x_i_s_o_m_e' a logical vector

     '_a_c_t_i_n' a logical vector

     '_n_u_c_l_e_o_l_u_s' a logical vector

     '_c_y_t_o_p_l_a_s_m' a logical vector

     '_E_R__t_o__G_o_l_g_i' a logical vector

     '_e_a_r_l_y__G_o_l_g_i' a logical vector

     '_l_i_p_i_d__p_a_r_t_i_c_l_e' a logical vector

     '_n_u_c_l_e_u_s' a logical vector

     '_b_u_d' a logical vector

     Explanation for missing abundance values are given by

     '_m_i_s_s_i_n_g_A_b_u_n_d_a_n_c_e' a factor with levels 'low signal', 'not
          visualized' and 'technical problem'

_D_e_t_a_i_l_s:

     The information on abundance is available in three columns.
     'abundance' gives (where available) absolute protein abundances
     determined by quantitative Western blot analysis of TAP-tagged
     strains.  Abundances that have a non-'NA' 'error' value were done
     in triplicate with serial dilutions of purified TAP-tagged
     standards included in each gel, which substantially reduces the
     measurement error. In addition, for these strains, the tagged
     genes were confirmed to rescue the loss of function phenotype of
     the corresponding deletion strain.  For rows where 'abundance' is
     missing ('NA'), the 'missingAbundance' column gives the reason. 
     Possible reasons are:

     '"_n_o_t _v_i_s_u_a_l_i_z_e_d"' Either the tagging was unsuccessful or no
          signal was detected.

     '"_l_o_w _s_i_g_n_a_l"' The tagging was successful, but the signal was not
          sufficiently high above background to permit accurate
          quantitation (about 50 molecules/cell).

     '"_t_e_c_h_n_i_c_a_l _p_r_o_b_l_e_m"' The protein was detectable but could not be
          quantitated because it did not migrate as a single band or
          comigrated with the internal standards in the gel.

     Replicate analysis for a subset of tagged strains found a linear
     correlation coefficient of R = 0.94, with the pairs of proteins
     having a median variation of a factor of 2.0. This error analysis
     does not account for potential alterations in the endogenous
     levels of the proteins caused by the the fused tag, which may be
     particularly disruptive for small proteins.

_S_o_u_r_c_e:

     The data were obtained from <URL: http://yeastgfp.ucsf.edu/>,
     which contains a lot more information as well as raw image data. 
     This data frame was specifically generated from <URL:
     http://yeastgfp.ucsf.edu/allOrfData.txt>

_R_e_f_e_r_e_n_c_e_s:

     For the Localization data: Huh, et al., Nature 425, 686-691 (2003)
     - <URL:
     http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562095&dopt=Abstract>

     For the Protein abundance data: Ghaemmaghami, et al., Nature 425,
     737-741 (2003) - <URL:
     http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562106&dopt=Abstract>

_E_x_a_m_p_l_e_s:

     data(gfp)
     keep <- names(which(table(gfp$localization_summary) > 50))

     if (require(lattice)) {
       bwplot(reorder(localization_summary, abundance, median, na.rm = TRUE) ~ log2(abundance), gfp,
              varwidth = TRUE,
              subset = localization_summary %in% keep)
     } else {

       opar <- par(las = 2, mar = par("mar") + c(3.5, 0, 0, 0))
       gfp._sub <- subset(gfp, localization_summary %in% keep)
       gfp._sub$localization_summary <- gfp._sub$localization_summary[, drop = TRUE]
       boxplot(log2(abundance) ~ reorder(localization_summary, abundance, median, na.rm = TRUE), 
               data = gfp._sub, varwidth = TRUE)
       rm(gfp._sub)
       par(opar)

     }

