yeastAnn             package:AnnBuilder             R Documentation

_F_u_n_c_t_i_o_n_s _t_o _a_n_n_o_t_a_t_e _y_e_a_s_t _g_e_n_o_m _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Given a GEO accession number for a yease data set and the
     extensions for annotation data files names that are available from
     Yeast Genom web site, the functions generates a data package with
     containing annoatation data for yeast genes in the GEO data set.

_U_s_a_g_e:

     yeastAnn(base = "", yGenoUrl,
                      yGenoNames =
                      c("literature_curation/gene_literature.tab",
                      "chromosomal_feature/SGD_features.tab",
                      "literature_curation/gene_association.sgd.gz"), toKeep =
                      list(c(6, 1), c(1, 5, 9, 10, 12, 16, 6), c(2, 5, 7)),
                      colNames = list(c("sgdid", "pmid"), c("sgdid",
                      "genename", "chr", "chrloc", "chrori", "description",
                      "alias"), c("sgdid", "go")), seps = c("\t", "\t",
                      "\t"), by = "sgdid")
     getProbe2SGD(probe2ORF = "", yGenoUrl,
                  fileName = "literature_curation/orf_geneontology.tab",
                  toKeep = c(1, 7), colNames = c("orf", "sgdid"), sep = "\t",
                  by = "orf")
     procYeastGeno(baseURL, fileName, toKeep, colNames, seps = "\t")
     getGEOYeast(GEOAccNum, GEOUrl, geoCols = c(1, 8), yGenoUrl) 
     formatGO(gos, evis)
     formatChrLoc(chr, chrloc, chrori)
     getYGExons(srcUrl,
                yGenoName = "chromosomal_feature/intron_exon.tab", sep = "\t")  

_A_r_g_u_m_e_n_t_s:

    base: 'base' a file name for a  matrix with two columns.  The first
          column is probe ids and the second one are the mappings to 
          SGD ids used by all the Yeast Genome data files. If 'base' =
          "",  the whole genome will be mapped based on a data file
          that contains  mappings between all the ORFs and SGD ids

GEOAccNum: 'GEOAccNum' a character string for the accession number
          given by GEO for a yeast data set

  GEOUrl: 'GEOUrl' a character string for the url that contains a
          common CGI for all the GEO data. Currently it is <URL:
          http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?>

 geoCols: 'geoCols' a vector of integers for the coloumn numbers of the
          source file from GEO that maps yeast probe ids to ORF ids

yGenoUrl: 'yGenoUrl' a character string for the url that is a directory
          in Yeast Genom web site that contains directories for yeast
          annotation data. Currently it is <URL:
          ftp://genome-ftp.stanford.edu/pub/yeast/data_download/>

 baseURL: see yGenoUrl

yGenoNames: 'yGenoNames' a vector of character strings for the names of
          yeast annotation data. Each of the strings can be appended to
          yGenoUrl to make a complete url for a data file

fileName: a character string for the extension part of the source data
          file that can be used to target genes to SGD ids

  toKeep: 'toKeep' a list of vector of integers with numbers
          corresponding to column numbers of yeast genom data files
          that will kept when data files are processed. The length of
          toKeep must be the same as yGenoName (a vector for each file)

colNames: 'colNames' a list of vectors of character strings for the
          names to be given to the columns to keep when processing the
          data. Again, the length of colNames must be the same as
          yGenoNames

    seps: 'seps' a vector of characters for the separators used by the
          data files included in yGenoNames

     sep: singular version of seps

      by: 'by' a character string for the column that is common in all
          data files to be processed. The column will be used to merge
          separate data files

probe2ORF: 'probe2ORF' a matrix with mappings of yease target genes to
          ORF ids that in turn can be mapped to SGD ids

     gos: 'gos' a vector of character strings for GO ids retrieved from
          Yeast Genome Project

    evis: 'evis' a vector of character string for the evidence code
          associated with go ids

     chr: 'chr' a vector of character strings for chromosome numbers

  chrloc: 'chrloc' a vector of integers for chromosomal locations

  chrori: 'chrori' a vector of characters that can either be w or c
          that are used for strand of yeast chromosomes

  srcUrl: 'srcUrl' a character string for the url where source yeast
          genome data are stroed

yGenoName: 'yGenoName' a character string for the yeast genome file
          name to be processed

_D_e_t_a_i_l_s:

     To merge files, the system has to map the target genes in the base
     file to SGD ids and then use SGD ids to map traget genes to
     annotation data from different sources.

     'formatGO' adds leading 0s to goids when needed and then append
     the evidence code to the end of a goid following a "@".

     'formatChrLoc' assigns a + or - sing to 'chrloc' depending on
     whether the corresponding 'chrori' is w or c and then append 'chr'
     to the end of 'chrloc' following a "@".

     'getGEOYeast' gets yeast data from GEO for the columns specified.

_V_a_l_u_e:

     'yeastAnn' returns a matrix with traget genes annotated by data
     from selected data columns in different data sources.

     'getProbe2SGD' returns a matrix with mappings between target genes
     and SGD ids.

     'procYeastGeno' returns a data matrix.

     'formatGO' returns a vector of character strings.

     'formatChrLoc' returns a vector of character strings.

     'getGEOYeast' returns a matrix with the number of columns
     specified.

_A_u_t_h_o_r(_s):

     Jianhua Zhang

_R_e_f_e_r_e_n_c_e_s:

     <URL: ftp://genome-ftp.stanford.edu>

_E_x_a_m_p_l_e_s:

     ## Not run: 
     yeastData <- yeastAnn(GEOAccNum = "GPL90")
     ## End(Not run)

