UCSCTableQuery-class       package:rtracklayer       R Documentation

_Q_u_e_r_y_i_n_g _U_C_S_C _T_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     The UCSC genome browser is backed by a large database, which is
     exposed by the Table Browser web interface. Tracks are stored as
     tables, so this is also the mechanism for retrieving tracks. The
     'UCSCTableQuery' class represents a query against the Table
     Browser. Storing the query fields in a formal class facilitates
     incremental construction and adjustment of a query.

_D_e_t_a_i_l_s:

     There are four supported fields for a table query:

     _s_e_s_s_i_o_n The 'UCSCSession' instance from the tables are retrieved.
          Although all sessions are based on the same database, the set
          of user-uploaded tracks, which are represented as tables, is
          not the same, in general.

     _t_r_a_c_k_N_a_m_e The name of a track from which to retrieve a table. Each
          track can have multiple tables. Many times there is a primary
          table that is used to display the track, while the other
          tables are supplemental. Sometimes, tracks are displayed by
          aggregating multiple tables.

     _t_a_b_l_e_N_a_m_e The name of the specific table to retrieve. May be
          'NULL', in which case the behavior depends on how the query
          is executed, see below.

     _r_a_n_g_e A 'RangesList' indicating the portion of the table to
          retrieve, in genome coordinates. The 'genome' indicated by
          the 'RangesList' also determines which tracks are available
          and must always be non-'NULL'.  If the 'RangesList' is empty,
          the table is downloaded for the entire genome.


     A common workflow for querying the UCSC database is to create an
     instance of 'UCSCTableQuery' using the 'ucscTableQuery'
     constructor, invoke 'tableNames' to list the available tables for
     a track, and finally to retrieve the desired table either as a
     'data.frame' via 'getTable' or as a 'RangedData' track via
     'track'. See the examples.

     The reason for a formal query class is to facilitate multiple
     queries when the differences between the queries are small. For
     example, one might want to query multiple tables within the track
     and/or same genomic region, or query the same table for multiple
     regions. The 'UCSCTableQuery' instance can be incrementally
     adjusted for each new query. Some caching is also performed, which
     enhances performance.

_C_o_n_s_t_r_u_c_t_o_r:


      'ucscTableQuery(x, track, range = GenomicRanges(), table =
          NULL)':  Creates a 'UCSCTableQuery' with the 'UCSCSession'
          given as 'x' and the track name given by the single string
          'track'. 'range' should be a 'RangesList' instance, and it
          effectively defaults to 'range(x)'. Any missing information
          in 'range', often the genome identifier, is filled in from
          'range(x)'. The table name is given by 'table', which may be
          a single string or 'NULL'.


_E_x_e_c_u_t_i_n_g _Q_u_e_r_i_e_s:

     Below, 'object' is a 'UCSCTableQuery' instance.


      'track(object)': Retrieves the indicated table as a track, i.e. a
          'RangedData' instance. Note that not all tables are available
          as tracks.

      'getTable(object)': Retrieves the indicated table as a
          'data.frame'. Note that not all tables are output in
          parseable form.

      'tableNames(object)': Gets the names of the tables available for
          the session, track and range specified by the query.


_A_c_c_e_s_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x'/'object' is a 'UCSCTableQuery'
     object.


      'browserSession(object)', 'browserSession(object) <- value': Get
          or set the 'UCSCSession' to query.

      'trackName(x)', 'trackName(x) <- value': Get or set the single
          string indicating the track containing the table of interest.

      'tableName(x)', 'tableName(x) <- value': Get or set the single
          string indicating the name of the table to retrieve. May be
          'NULL', in which case the table is automatically determined.

      'range(x)', 'range(x) <- value': Get or set the 'RangesList'
          indicating the portion of the table to retrieve in genomic
          coordinates. Any missing information, such as the genome
          identifier, is filled in using 'range(browserSession(x))'.


_A_u_t_h_o_r(_s):

     Michael Lawrence

_E_x_a_m_p_l_e_s:

     ## Not run: 
     session <- browserSession()
     genome(session) <- "mm9"
     trackNames(session) ## list the track names
     ## choose the Conservation track for a portion of mm9 chr1
     query <- ucscTableQuery(session, "Conservation",
                             GenomicRanges(57795963, 57815592, "chr12"))
     ## list the table names
     tableNames(query)
     ## get the phastCons30way track
     tableName(query) <- "phastCons30way"
     ## retrieve the track data
     track(query)
     ## get a data.frame summarizing the multiple alignment
     tableName(query) <- "multiz30waySummary"
     getTable(query)
     ## End(Not run)

