XDataFrame-class           package:IRanges           R Documentation

_E_x_t_e_r_n_a_l _D_a_t_a _F_r_a_m_e

_D_e_s_c_r_i_p_t_i_o_n:

     The 'XDataFrame' emulates the interface of 'data.frame', but it
     supports the storage of any type of object as a column, as long as
     the 'length' and '[' methods are implemented. The X in its name
     indicates that it attempts to coerce its columns to external
     'XSequence' objects in a way that is completely transparent to the
     user. This helps to avoid unncessary copying.

_D_e_t_a_i_l_s:

     On the whole, the 'XDataFrame' behaves very similarly to
     'data.frame', in terms of construction, subsetting, splitting,
     combining, etc. The most notable exception is that the row names
     are optional. This means calling 'rownames(x)' will return 'NULL'
     if there are no row names. Of course, it could return
     'seq_len(nrow(x))', but returning 'NULL' informs, for example,
     combination functions that no row names are desired (they are
     often a luxury when dealing with large data).

     As 'XDataFrame' derives from 'AnnotatedList', it is possible to
     set an 'annotaiton' string. Also, another 'XDataFrame' can hold
     metadata on the columns.

_A_c_c_e_s_s_o_r_s:

     In the following code snippets, 'x' is an 'XDataFrame'.

      'dim(x)': Get the length two integer vector indicating in the
          first and second element the number of rows and columns,
          respectively.

      'dimnames(x)', 'dimnames(x) <- value': Get and set the two
          element list containing the row names (character vector of
          length 'nrow(x)' or 'NULL') and the column names (character
          vector of length 'ncol(x)').


_S_u_b_s_e_t_t_i_n_g:

     In the following code snippets, 'x' is an 'XDataFrame'.

      'x[i,j,drop]': Behaves very similarly to the '[.data.frame'
          method, except 'i' can be a logical 'Rle' object and
          subsetting by 'matrix' indices is not supported. Due to
          limitations in the subsetting of 'XSequence' objects, indices
          containing 'NA''s are not supported.

      'x[[i]]': Behaves very similarly to the '[[.data.frame' method,
          except arguments 'j' (why?) and 'exact' are not supported.
          Column name matching is always exact. Subsetting by matrices
          is not supported.

      'x[[i]] <- value': Behaves very similarly to the
          '[[<-.data.frame' method, except the argument 'j' is not
          supported. An attempt is made to coerce 'value' to a
          'XSequence' object.


_C_o_n_s_t_r_u_c_t_o_r:


_S_p_l_i_t_t_i_n_g _a_n_d _C_o_m_b_i_n_i_n_g:

     In the following code snippets, 'x' is an 'XDataFrame'.


      'split(x, f, drop = FALSE)': Splits 'x' into a
          'SplitXDataFrameList', according to 'f', dropping elements
          corresponding to unrepresented levels if 'drop' is 'TRUE'.

      'rbind(...)': Creates a new 'XDataFrame' by combining the rows of
          the 'XDataFrame' objects in '...'. Very similar to
          'rbind.data.frame', except in the handling of row names. If
          all elements have row names, they are concatenated and made
          unique. Otherwise, the result does not have row names.
          Currently, factors are not handled well (their levels are
          dropped). This is not a high priority until there is an
          'XFactor' class.

      'cbind(...)': Creates a new 'XDataFrame' by combining the columns
          of the 'XDataFrame' objects in '...'. Very similar to
          'cbind.data.frame', except row names, if any, are dropped.
          Consider the 'XDataFrame' as an alternative that allows one
          to specify row names.


_C_o_e_r_c_i_o_n:


      'as(from, "XDataFrame")': By default, constructs a new
          'XDataFrame' with 'from' as its only column. If 'from' is a
          'matrix' or 'data.frame', all of  its columns become columns
          in the new 'XDataFrame'. In any case, there is an attempt to
          coerce  columns to 'XSequence' before inserting them into the
          'XDataFrame'. If 'from' is a 'list', its elements become
          columns in the same way. Note that for the 'XDataFrame' to
          behave  correctly, each column object must support
          element-wise subsetting via the '[' method and return the
          number of elements with 'length'. It is recommended to use
          the 'XDataFrame'  constructor, rather than this interface.

      'as.list(x)': Coerces 'x', an 'XDataFrame', to a 'list',
          converting any 'XSequence' objects to vectors along the way.

      'as.data.frame(x, row.names=NULL, optional=FALSE)': Coerces 'x',
          an 'XDataFrame', to a 'data.frame'. Each column is coerced to
          a 'vector' and stored as a column in the 'data.frame'. If
          'row.names' is 'NULL', they are retrieved from 'x', if it has
          any. Otherwise, they are inferred by the 'data.frame'
          constructor.

      'as(from, "data.frame")': Coerces a 'XDataFrame' to a
          'data.frame' by calling 'as.data.frame(from)'.


_N_o_t_e:

     In the future, the general data frame functionality will probably
     be moved to a 'DataFrame' class. 'XDataFrame' will derive from
     'DataFrame' and encapsulate the behavior of attempting to coerce
     or even requiring columns to be 'XSequence'.

_A_u_t_h_o_r(_s):

     Michael Lawrence

_S_e_e _A_l_s_o:

     'RangedData', which makes heavy use of this class.

_E_x_a_m_p_l_e_s:

       score <- c(1L, 3L, NA)
       counts <- c(10L, 2L, NA)
       row.names <- c("one", "two", "three")
       
       xdf <- XDataFrame(score) # single column
       xdf[["score"]]
       xdf <- XDataFrame(score, row.names = row.names) #with row names
       rownames(xdf)
       
       xdf <- XDataFrame(vals = score) # explicit naming
       xdf[["vals"]]
       
       # a data.frame
       sw <- XDataFrame(swiss)
       as.data.frame(sw) # swiss, without row names
       # now with row names
       sw <- XDataFrame(swiss, row.names = rownames(swiss))
       as.data.frame(sw) # swiss

       # subsetting
         
       sw[] # identity subset
       sw[,] # same

       sw[NULL] # no columns
       sw[,NULL] # no columns
       sw[NULL,] # no rows

       ## select columns
       sw[1:3]
       sw[,1:3] # same as above
       sw[,"Fertility"]
       sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)]

       ## select rows and columns
       sw[4:5, 1:3]
       
       sw[1] # one-column XDataFrame
       ## the same
       sw[, 1, drop = FALSE]
       sw[, 1] # a (unnamed) vector
       sw[[1]] # the same
       sw[["Fertility"]]

       sw[["Fert"]] # should return 'NULL'
       
       sw[1,] # a one-row XDataFrame
       sw[1,, drop=TRUE] # a list

       ## duplicate row, unique row names are created
       sw[c(1, 1:2),]

       ## indexing by row names  
       sw["Courtelary",]
       subsw <- sw[1:5,1:4]
       subsw["C",] # partially matches

       ## row and column names
       cn <- paste("X", seq_len(ncol(swiss)), sep = ".")
       colnames(sw) <- cn
       colnames(sw)
       rn <- seq(nrow(sw))
       rownames(sw) <- rn
       rownames(sw)

       ## column replacement

       xdf[["counts"]] <- counts
       xdf[["counts"]]
       xdf[[3]] <- score
       xdf[["X"]]
       xdf[[3]] <- NULL # deletion

       ## split

       sw <- XDataFrame(swiss)
       swsplit <- split(sw, sw[["Education"]])
       
       ## rbind

       do.call(rbind, as.list(swsplit))

       ## cbind

       cbind(XDataFrame(score), XDataFrame(counts))

