getGEO               package:GEOquery               R Documentation

_G_e_t _a _G_E_O _o_b_j_e_c_t _f_r_o_m _N_C_B_I _o_r _f_i_l_e

_D_e_s_c_r_i_p_t_i_o_n:

     This function is the main user-level function in the GEOquery
     package.  It directs the download (if no filename is specified)
     and parsing of a GEO SOFT format file into an R data structure
     specifically designed to make access to each of the important
     parts of the GEO SOFT format easily accessible.

_U_s_a_g_e:

     getGEO(GEO = NULL, filename = NULL, destdir = tempdir(), GSElimits=NULL)

_A_r_g_u_m_e_n_t_s:

     GEO: A character string representing a GEO object for download and
          parsing.  (eg., 'GDS505','GSE2','GSM2','GPL96')

filename: The filename of a previously downloaded GEO SOFT format file
          or its gzipped representation (in which case the filename
          must end in .gz).  Either one of GEO or filename may be
          specified, not both.  

 destdir: The destination directory for any downloads.  Defaults to the
          architecture-dependent tempdir.  You may want to specify a
          different directory if you want to save the file for later
          use. Doing so is a good idea if you have a slow connection,
          as some of the GEO files are HUGE!

GSElimits: This argument can be used to load only a contiguous subset
          of the GSMs from a GSE.  It should be specified as a vector
          of length 2 specifying the start and end (inclusive) GSMs to
          load. This could be useful for splitting up large GSEs into
          more manageable parts, for example.

_D_e_t_a_i_l_s:

     getGEO functions to download and parse information available from
     NCBI GEO (<URL: http://www.ncbi.nlm.nih.gov/geo>).  Here are some
     details about what is avaible from GEO.  All entity types are
     handled by getGEO and essentially any information in the GEO SOFT
     format is reflected in the resulting data structure.

     From the GEO website:

     The Gene Expression Omnibus (GEO) from NCBI serves as a public
     repository for a wide range of high-throughput experimental data.
     These data include single and dual channel microarray-based
     experiments measuring mRNA, genomic DNA, and protein abundance, as
     well as non-array techniques such as serial analysis of gene
     expression (SAGE), and mass spectrometry proteomic data. At the
     most basic level of organization of GEO, there are three entity
     types that may be supplied by users: Platforms, Samples, and
     Series. Additionally, there is a curated entity called a GEO
     dataset.

     A Platform record describes the list of elements on the array
     (e.g., cDNAs, oligonucleotide probesets, ORFs, antibodies) or the
     list of elements that may be detected and quantified in that
     experiment (e.g., SAGE tags, peptides). Each Platform record is
     assigned a unique and stable GEO accession number (GPLxxx). A
     Platform may reference many Samples that have been submitted by
     multiple submitters. 

     A Sample record describes the conditions under which an individual
     Sample was handled, the manipulations it underwent, and the
     abundance measurement of each element derived from it. Each Sample
     record is assigned a unique and stable GEO accession number
     (GSMxxx). A Sample entity must reference only one Platform and may
     be included in multiple Series.

     A Series record defines a set of related Samples considered to be
     part of a group, how the Samples are related, and if and how they
     are ordered. A Series provides a focal point and description of
     the experiment as a whole. Series records may also contain tables
     describing extracted data, summary conclusions, or analyses. Each
     Series record is assigned a unique and stable GEO accession number
     (GSExxx). 

     GEO DataSets (GDSxxx) are curated sets of GEO Sample data. A GDS
     record represents a collection of biologically and statistically
     comparable GEO Samples and forms the basis of GEO's suite of data
     display and analysis tools. Samples within a GDS refer to the same
     Platform, that is, they share a common set of probe elements.
     Value measurements for each Sample within a GDS are assumed to be
     calculated in an equivalent manner, that is, considerations such
     as background processing and normalization are consistent across
     the dataset. Information reflecting experimental design is
     provided through GDS subsets.

_V_a_l_u_e:

     An object of the appropriate class (GDS, GPL, GSM, or GSE) is
     returned.

_W_a_r_n_i_n_g:

     Some of the files that are downloaded, particularly those
     associated with GSE entries from GEO are absolutely ENORMOUS and
     parsing them can take quite some time and memory.  So,
     particularly when working with large GSE entries, expect that you
     may need a good chunk of memory and that coffee may be involved
     when parsing....

_A_u_t_h_o_r(_s):

     Sean Davis

_S_e_e _A_l_s_o:

     'getGEOfile'

_E_x_a_m_p_l_e_s:

     gds <- getGEO("GDS2")
     gds

