IRanges-class           package:Biostrings           R Documentation

_I_R_a_n_g_e_s _a_n_d _N_o_r_m_a_l_I_R_a_n_g_e_s _o_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The IRanges class is a simple container for storing a set of
     integer ranges.

     A NormalIRanges object is an IRanges object that is "normal". See
     the Normality section below for the definition and properties of
     normal IRanges objects.

_D_e_t_a_i_l_s:

     An IRanges object is a data frame-like object where each row
     describes a "range" of integers.

     A "range" of integers is a finite set of consecutive integer
     values. Each range can be fully described with exactly 2 integers
     which can be arbitrarily picked up among the 3 following integers:
     its "start" i.e. its smallest (or first, or leftmost) value; its
     "end" i.e. its greatest (or last, or rightmost) value; and its
     "width" i.e. the number of values in the range. For example the
     set of integers that are greater than or equal to -20 and less
     than or equal to 400 is the range of integers that starts at -20
     and has a width of 421.

     The start can be any integer (see 'start' below) but the width
     must be a nonnegative integer (see 'width' below). The end of a
     range is its start plus its width minus one (see 'end' below). An
     "empty" range is a range that contains no value i.e. a range that
     has a null width. Note that for an empty range, the end is smaller
     than the start.

     Two ranges are considered equal iff they share the same start and
     width. Note that with this definition, 2 empty ranges are
     generally not equal (they need to share the same start to be
     considered equal).

     The length of an IRanges object is the number of ranges in it i.e.
     the number of rows in the object.

     An IRanges object is considered empty iff all its ranges are
     empty.

     Note that it is unlikely that the user will have to create or
     manipulate directly an IRanges instance when using the Biostrings
     package. However the IRanges class being a superclass of the
     XStringViews class, any XStringViews object is also an IRanges
     object and can be manipulated as such. Therefore all the methods
     described here also work with an XStringViews object.

_I_R_a_n_g_e_s _o_b_j_e_c_t _v_s _d_a_t_a _f_r_a_m_e:

     An important difference with standard R data frames is that
     IRanges objects only support single subscript subsetting i.e.
     subsetting by row, whereas standard R data frames can be subsetted
     by row and by column. As a consequence, the length of an IRanges
     object is its number of rows, whereas the length of a standard R
     data frame object is its number of columns.

_A_c_c_e_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an IRanges object.


      'length(x)': The number of ranges in 'x'.

      'start(x)': The start values of the ranges. This is a vector of
          positive integers of the same length as 'x'.

      'width(x)': The number of integers in each range. This is a
          vector of nonnegative integers of the same length as 'x'.

      'end(x)': 'start(x) + width(x) - 1L'

      'names(x)': 'NULL' or a character vector of the same length as
          'x'.

      'desc(x)': 'desc' is an alias for 'names'.


_C_o_n_s_t_r_u_c_t_o_r:


      'IRanges(start=NULL, end=NULL, width=NULL)': Return the IRanges
          object containing the ranges specified by 'start', 'end' and
          'width'. Exactly two of the 'start', 'end' and 'width'
          arguments must be specified as integer vectors (with no
          'NA's) and the other argument must be 'NULL'. If 'start' and
          'end' are specified, then they must be vectors of the same
          length. If 'start' and 'width' (or 'end' and 'width') are
          specified, then the length of 'width' must be <= to the
          length of 'start' and, if it is <, then 'width' is expanded
          cyclically to the length of 'start'.


_S_u_b_s_e_t_t_i_n_g:

     In the code snippet below, 'x' is an IRanges object.


      'x[i]': Return a new IRanges object (of the same type as 'x')
          made of the selected ranges. 'i' can be a numeric vector, a
          logical vector, 'NULL' or missing. If 'x' is a NormalIRanges
          object and 'i' a positive numeric subscript (i.e. a numeric
          vector of positive values), then 'i' must be strictly
          increasing.


_O_t_h_e_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is an IRanges object.


      'isEmpty(x)': Return a logical value indicating whether 'x' is
          empty or not.

      'as.data.frame(x, row.names=NULL, optional=FALSE, ...)': Converts
          'x' into a standard R data frame object. 'row.names' must be
          'NULL' or a character vector giving the row names for the
          data frame, and 'optional' and any additional argument
          ('...') is ignored. See '?as.data.frame' for more information
          about these arguments.

      'duplicated(x)': Determines which elements of 'x' are equal to
          elements with smaller subscripts, and returns a logical
          vector indicating which elements are duplicates. It is
          semantically equivalent to 'duplicated(as.data.frame(x))'
          (see '?duplicated' for more information).

      'as.matrix(x, ...)': Converts 'x' into a 2-column integer matrix
          containing 'start(x)' and 'width(x)'. Extra arguments ('...')
          are ignored.


_N_o_r_m_a_l_i_t_y:

     A NormalIRanges object is an IRanges object that is "normal".

     An IRanges object is said to be "normal" when its ranges are: (a)
     not empty (i.e. they have a non-null width); (b) not overlapping;
     (c) ordered from left to right; (d) not even adjacent (i.e. there
     must be a non empty gap between 2 consecutive ranges). If 'x' is
     an IRanges object with more than one element (i.e. 'length >= 2'),
     then 'x' is normal iff:

     start(x)[i] <= end(x)[i] < start(x)[i+1] <= end(x)[i+1]

     for every 1 <= 'i' < 'length(x)'. If 'length(x) == 1', then 'x' is
     normal iff 'width(x)[1] >= 1'. If 'length(x) == 0', then 'x' is
     normal.

     An IRanges object can be used to represent an arbitrary finite set
     of integers (that are not necessarily consecutive). Now the 2 most
     interesting properties of normal IRanges objects are that: (1)
     they are the "best" (in terms of storage space) IRanges objects
     for representing arbitrary finite sets of integers and (2) the
     mapping between finite sets of integers and normal IRanges objects
     is one-to-one. More precisely, if 'x' is an IRanges object, then
     it can be seen as representing the set of integers obtained by
     taking the union of all its ranges. Inverserly, since any finite
     set of integers can be obtained by a finite union of ranges, then
     it can be represented by an IRanges object, but this
     representation is clearly not unique. However, among all the
     IRanges objects that represent (or map) the same finite set of
     integers, only one is normal, and this normal representation is
     minimal in terms of length (and therefore in terms of storage
     space).

     Subsetting 'x' is currently not supported although it could be but
     should only accept strictly increasing subscripts in order to
     preserve normality.

     Use the 'isNormal' method to check whether an IRanges object is
     normal or not. In the code snippet below, 'x' is an IRanges
     object.


      'isNormal(x)': Return a logical value indicating whether 'x' is
          normal or not.

      'whichFirstNotNormal(x)': Return 'NA' if 'x' is normal, or the
          smallest valid indice 'i' in 'x' for which 'x[1:i]' is not
          normal.

      'max(x)': (Defined for NormalIRanges objects only.) The maximum
          value in the finite set of integers represented by 'x'.

      'min(x)': (Defined for NormalIRanges objects only.) The minimum
          value in the finite set of integers represented by 'x'.


_D_e_p_r_e_c_a_t_e_d _m_e_t_h_o_d_s:

 'first(x)': deprecated. Use 'start' instead.

 'last(x)': deprecated. Use 'end' instead. 

_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     IRanges-utils, XStringViews-class, 'as.data.frame', 'duplicated',
     'as.matrix'

_E_x_a_m_p_l_e_s:

       x <- IRanges(start=c(2:-1, 13:15), width=c(0:3, 2:0))
       x
       length(x)
       start(x)
       width(x)
       end(x)
       isEmpty(x)
       as.data.frame(x)
       as.matrix(x)

       ## Subsetting:
       x[4:2]                  # 3 ranges
       x[-1]                   # 6 ranges
       x[FALSE]                # 0 range
       x0 <- x[width(x) == 0]  # 2 ranges
       isEmpty(x0)

       ## Unlock the IRanges instance and use replacement methods to slide
       ## or resize its elements:
       x <- as(x, "UnlockedIRanges")
       width(x) <- width(x) * 2  + 1  # resize elements
       x
       start(x) <- end(x)             # slide elements
       x
       start(x)[4] <- end(x)[4]       # slide the 4th element
       x
       end(x)[1] <- start(x)[3]       # slide the first element
       x
       width(x) <- c(2, 0)            # resize elements
       x
       duplicated(x)

       ## Name the elements:
       names(x)
       names(x) <- c("range1", "range2")
       x
       x[names(x) == ""]  # 5 ranges
       x[names(x) != ""]  # 2 ranges

       ## Using an IRanges object for storing a big set of ranges is more
       ## efficient than using a standard R data frame:
       N <- 2000000L  # nb of ranges
       W <- 180L      # width of each range
       start <- 1L
       end <- 50000000L
       set.seed(777)
       range_starts <- sort(sample(end-W+1L, N))
       range_widths <- rep.int(W, N)
       ## Instantiation is faster
       system.time(x <- IRanges(start=range_starts, width=range_widths))
       system.time(y <- data.frame(start=range_starts, width=range_widths))
       ## Subsetting is faster
       system.time(x16 <- x[c(TRUE, rep.int(FALSE, 15))])
       system.time(y16 <- y[c(TRUE, rep.int(FALSE, 15)), ])
       ## Internal representation is more compact
       object.size(x16)
       object.size(y16)

       ## Normality:
       isNormal(x16)                        # FALSE
       if (interactive())
           x16 <- as(x16, "NormalIRanges")  # Error!
       whichFirstNotNormal(x16)             # 57
       isNormal(x16[1:56])                  # TRUE
       xx <- as(x16[1:56], "NormalIRanges")
       class(xx)
       max(xx)
       min(xx)

