RangedData-class           package:IRanges           R Documentation

_D_a_t_a _o_n _r_a_n_g_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'RangedData' supports storing data, i.e. a set of variables, on a
     set of ranges spanning multiple spaces (e.g. chromosomes).
     Although the data is split across spaces, it can still be treated
     as one cohesive dataset when desired. In order to handle large
     datasets, the data values are stored externally to avoid copying,
     and the 'rdapply' function facilitates the processing of each
     space separately (divide and conquer).

_D_e_t_a_i_l_s:

     A 'RangedData' object consists of two primary components: a
     'RangesList' holding the ranges over multiple spaces and a
     parallel 'SplitXDataFrameList', holding the split data. There is
     also an 'universe' slot for denoting the source (e.g. the genome)
     of the ranges and/or data.

     There are two different modes of interacting with a 'RangedData'.
     The first mode treats the object as a contiguous "data frame"
     annotated with range information. The accessors 'start', 'end',
     and 'width' get the corresponding fields in the ranges as atomic
     integer vectors, undoing the division over the spaces. The '[['
     and matrix-style '[,' extraction and subsetting functions unroll
     the data in the same way. '[[<-' does the inverse. The number of
     rows is defined as the total number of ranges and the number of
     columns is the number of variables in the data. It is often
     convenient and natural to treat the data this way, at least when
     the data is small and there is no need to distinguish the ranges
     by their space.

     The other mode is to treat the 'RangedData' as a list, with an
     element (a virtual 'Ranges'/'XDataFrame' pair) for each space. The
     length of the object is defined as the number of spaces and the
     value returned by the 'names' accessor gives the names of the
     spaces. The list-style '[' subset function behaves analogously.
     The 'rdapply' function provides a convenient and formal means of
     applying an operation over the spaces separately. This mode is
     helpful when ranges from different spaces must be treated
     separately or when the data is too large to process over all
     spaces at once.

_O_b_j_e_c_t _U_p_d_a_t_i_n_g:


      updateRangedData(object): Updates instances of objects the
          inherit from an older RangedData class definition to match
          the current RangedData class definition.

_A_c_c_e_s_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a 'RangedData' object.

     The following accessors treat the data as a contiguous dataset,
     ignoring the division into spaces:

      Array accessors:

           'nrow(x)': The number of ranges in 'x'.

           'ncol(x)': The number of data variables in 'x'.

           'dim(x)': An integer vector of length two, essentially
               'c(nrow(x), ncol(x))'.

           'rownames(x)', 'rownames(x) <- value': Gets or sets the
               names of the ranges in 'x'.

           'colnames(x)', 'colnames(x) <- value': Gets the names of the
               variables in 'x'.

           'dimnames(x)': A list with two elements, essentially
               'list(rownames(x), colnames(x))'.

           'dimnames(x) <- value': Sets the row and column names, where
               value is a list as described above.


      Range accessors. The type of the return value depends on the type
          of 'Ranges'. For 'IRanges', an integer vector. Regardless,
          the number of elements is always equal to 'nrow(x)'.

           'start(x)': The start value of each range.

           'width(x)': The width of each range.

           'end(x)': The end value of each range.


     These accessors make the object seem like a list along the spaces:

      'length(x)': The number of spaces (e.g. chromosomes) in 'x'.

      'names(x)', 'names(x) <- value': Get or set the names of the
          spaces (e.g. '"chr1"').  'NULL' or a character vector of the
          same length as 'x'.


     Other accessors:

      'universe(x)', 'universe(x) <- value': Get or set the scalar
          string identifying the scope of the data in some way (e.g.
          genome, experimental platform, etc). The universe may be
          'NULL'.

      'ranges(x)': Gets the ranges in 'x' as a 'RangesList'.

      'space(x)': Gets the spaces from 'ranges(x)'.

      'values(x)': Gets the data values in 'x' as a
          'SplitXDataFrameList'.


_C_o_n_s_t_r_u_c_t_o_r:


      'RangedData(ranges = IRanges(), ..., splitter = NULL, universe =
          NULL)': Creates a 'RangedData' with the ranges in 'ranges'
          and variables given by the arguments in '...'.  See the
          constructor 'XDataFrame' for how the '...' arguments are
          interpreted. If 'splitter' is 'NULL', all of the ranges and
          values are placed into the same space, resulting in a
          single-space (length one) 'RangedData'. Otherwise, the ranges
          and values are split into spaces according to 'splitter',
          which is treated as a factor, like the 'f' argument in
          'split'. The universe may be specified as a scalar string by
          the 'universe' argument.


_C_o_e_r_c_i_o_n:


      'as.data.frame(x, row.names=NULL, optional=FALSE, ...)': Copy the
          start, end, width of the ranges and all of the variables as
          columns in a 'data.frame'. This is a bridge to existing
          functionality in R, but of course care must be taken if the
          data is large. Note that 'optional' and '...' are ignored.

      'as(from, "XDataFrame")': Like 'as.data.frame' above, except the
          result is an 'XDataFrame' and it probably involves less
          copying, especially if there is only a single space.

      'as(from, "RangedData")': Coerce an 'Rle' or an 'XRle' to a
          'RangedData' by converting each run to a range and storing
          the run values in a column named "score".


_S_u_b_s_e_t_t_i_n_g _a_n_d _R_e_p_l_a_c_e_m_e_n_t:

     In the code snippets below, 'x' is a 'RangedData' object.


      'x[i]': Subsets 'x' by indexing into its spaces, so the result is
          of the same class, with a different set of spaces. 'i' can be
          numerical, logical, 'NULL' or missing.

      'x[i,j]': Subsets 'x' by indexing into its rows and columns. The
          result is of the same class, with a different set of rows and
          columns. Note that this differs from the subset form above,
          because we are now treating 'x' as one contiguous dataset.

      'x[[i]]': Extracts a variable from 'x', where 'i' can be a
          character, numeric, or logical scalar that indexes into the
          columns. The variable is unlisted over the spaces.

      'x$name': similar to above, where 'name' is taken literally as a
          column name in the data.

      'x[[i]] <- value': Sets value as column 'i' in 'x', where 'i' can
          be a character, numeric, or logical scalar that indexes into
          the columns. The length of 'value' should equal 'nrow(x)'.
          'x[[i]]' should be identical to 'value' after this operation.

      'x$name <- value': similar to above, where 'name' is taken
          literally as a column name in the data.


_S_p_l_i_t_t_i_n_g _a_n_d _C_o_m_b_i_n_i_n_g:

     In the code snippets below, 'x' is a 'RangedData' object.


      'split(x, f, drop = FALSE)': Split 'x' according to 'f', which
          should be of length equal to 'nrow(x)'. Note that 'drop' is
          ignored here. The result is a 'RangedDataList' where every
          element has the same  length (number of spaces) but different
          sets of ranges within each space.

      'rbind(...)': Matches the spaces from the 'RangedData' objects in
          '...' by name and combines them row-wise. In a way, this is
          the reverse of the 'split' operation described above.

      'c(x, ..., recursive = FALSE)': Combines 'x' with arguments
          specified in '...', which must all be 'RangedData' objects.
          This combination acts as if 'x' is a list of spaces, meaning
          that the result will contain the spaces of the first
          concatenated with the spaces of the second, and so on. This
          function is useful when creating 'RangedData' objects on a
          space-by-space basis and then needing to combine them.


_A_p_p_l_y_i_n_g:

     There are two ways explicitly supported ways to apply a function
     over the spaces of a 'RangedData'. The richest interface is
     'rdapply', which is described in its own man page. The simpler
     interface is an 'lapply' method:

      'lapply(X, FUN, ...)': Applies 'FUN' to each space in 'X' with
          extra parameters in '...'.


_A_u_t_h_o_r(_s):

     Michael Lawrence

_S_e_e _A_l_s_o:

     RangedData-utils for utlities and the 'rdapply' function for
     applying a function to each space separately.

_E_x_a_m_p_l_e_s:

       ranges <- IRanges(c(1,2,3),c(4,5,6))
       filter <- c(1L, 0L, 1L)
       score <- c(10L, 2L, NA)

       ## constructing RangedData instances

       ## no variables
       rd <- RangedData()
       rd <- RangedData(ranges)
       ranges(rd)
       ## one variable
       rd <- RangedData(ranges, score)
       rd[["score"]]
       ## multiple variables
       rd <- RangedData(ranges, filter, vals = score)
       rd[["vals"]] # same as rd[["score"]] above
       rd$vals
       rd[["filter"]]
       rd <- RangedData(ranges, score + score)
       rd[["score...score"]] # names made valid
       ## use a universe
       rd <- RangedData(ranges, universe = "hg18")
       universe(rd)

       ## split some data over chromosomes

       range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
       both <- c(ranges, range2)
       score <- c(score, c(0L, 3L, NA, 22L))
       filter <- c(filter, c(0L, 1L, NA, 0L)) 
       chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")

       rd <- RangedData(both, score, filter, space = chrom, universe = "hg18")
       rd[["score"]] # identical to score
       rd[1][["score"]] # identical to score[1:3]
       
       ## subsetting

       ## list style: [i]

       rd[numeric()] # these three are all empty
       rd[logical()]
       rd[NULL]
       rd[] # missing, full instance returned
       rd[FALSE] # logical, supports recycling
       rd[c(FALSE, FALSE)] # same as above
       rd[TRUE] # like rd[]
       rd[c(TRUE, FALSE)]
       rd[1] # numeric index
       rd[c(1,2)]
       rd[-2]

       ## matrix style: [i,j]

       rd[,NULL] # no columns
       rd[NULL,] # no rows
       rd[,1]
       rd[,1:2]
       rd[,"filter"]
       rd[1,] # now by the rows
       rd[c(1,3),]
       rd[1:2, 1] # row and column
       rd[c(1:2,1,3),1] ## repeating rows

       ## dimnames

       colnames(rd)[2] <- "foo"
       colnames(rd)
       rownames(rd) <- head(letters, nrow(rd))
       rownames(rd)

       ## space names

       names(rd)
       names(rd)[1] <- "chr1"

       ## variable replacement

       count <- c(1L, 0L, 2L)
       rd <- RangedData(ranges, count, space = c(1, 2, 1))
       ## adding a variable
       score <- c(10L, 2L, NA)
       rd[["score"]] <- score
       rd[["score"]] # same as 'score'
       ## replacing a variable
       count2 <- c(1L, 1L, 0L)
       rd[["count"]] <- count2
       ## numeric index also supported
       rd[[2]] <- score
       rd[[2]] # gets 'score'
       ## removing a variable
       rd[[2]] <- NULL
       ncol(rd) # is only 1
       rd$score2 <- score
       
       ## combining/splitting

       rd <- RangedData(ranges, score, space = c(1, 2, 1))
       c(rd[1], rd[2]) # equal to 'rd'
       rd2 <- RangedData(ranges, score)
       unlist(split(rd2, c(1, 2, 1))) # same as 'rd'

       ## applying

       lapply(rd, `[[`, 1) # get first column in each space

