| gfp {yeastExpData} | R Documentation |
This data frame contains data concerning the localization and abundance of various yeast proteins.
data(gfp)
A data frame with 6234 observations on the following 33 variables.
orfidyORFYAL001C, YAL002W, etc. These are also the
row names of the data frame.gene_nameAAC1, AAC3, etc. GFP_taggednot tagged and
tagged, indicating whether or not the ORF was GFP taggedGFP_visualizednot
visualized and visualized, indicating whether or not GFP
fluoresence was visualizedTAP_visualizedTAP
visualized and not TAP visualized, indicating success of
TAP tagabundanceerrorabundance (see details below)localization_summary,
ER, ER to Golgi, ER,ambiguous,
ER,ambiguous,bud, etc. Summarizes the information
contained in the subsequent columns. The following columns indicate whether or not the protein was localized in the specific region of the cell. A protein can be localized in more than one region.
ambiguousmitochondrionvacuolespindle_polecell_peripherypunctate_compositevacuolar_membraneERnuclear_peripheryendosomebud_neckmicrotubuleGolgilate_GolgiperoxisomeactinnucleoluscytoplasmER_to_Golgiearly_Golgilipid_particlenucleusbudExplanation for missing abundance values are given by
missingAbundancelow signal, not visualized and
technical problem
The information on abundance is available in three columns.
abundance gives (where available) absolute protein abundances
determined by quantitative Western blot analysis of TAP-tagged
strains. Abundances that have a non-NA error value were
done in triplicate with serial dilutions of purified TAP-tagged
standards included in each gel, which substantially reduces the
measurement error. In addition, for these strains, the tagged genes
were confirmed to rescue the loss of function phenotype of the
corresponding deletion strain. For rows where abundance is
missing (NA), the missingAbundance column gives the
reason. Possible reasons are:
"not visualized""low signal""technical problem"Replicate analysis for a subset of tagged strains found a linear correlation coefficient of R = 0.94, with the pairs of proteins having a median variation of a factor of 2.0. This error analysis does not account for potential alterations in the endogenous levels of the proteins caused by the the fused tag, which may be particularly disruptive for small proteins.
The data were obtained from http://yeastgfp.ucsf.edu/, which contains a lot more information as well as raw image data. This data frame was specifically generated from http://yeastgfp.ucsf.edu/allOrfData.txt
For the Localization data: Huh, et al., Nature 425, 686-691 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562095&dopt=Abstract
For the Protein abundance data: Ghaemmaghami, et al., Nature 425, 737-741 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562106&dopt=Abstract
data(gfp)
keep <- names(which(table(gfp$localization_summary) > 50))
if (require(lattice)) {
bwplot(reorder(localization_summary, abundance, median, na.rm = TRUE) ~ log2(abundance), gfp,
varwidth = TRUE,
subset = localization_summary %in% keep)
} else {
opar <- par(las = 2, mar = par("mar") + c(3.5, 0, 0, 0))
gfp._sub <- subset(gfp, localization_summary %in% keep)
gfp._sub$localization_summary <- gfp._sub$localization_summary[, drop = TRUE]
boxplot(log2(abundance) ~ reorder(localization_summary, abundance, median, na.rm = TRUE),
data = gfp._sub, varwidth = TRUE)
rm(gfp._sub)
par(opar)
}