regionOverlap             package:Ringo             R Documentation

_F_u_n_c_t_i_o_n _t_o _c_o_m_p_u_t_e _o_v_e_r_l_a_p _o_f _g_e_n_o_m_i_c _r_e_g_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     Given two data frames of genomic regions, this function computes
     the base-pair overlap, if any, between every pair of regions from
     the two lists.

_U_s_a_g_e:

     regionOverlap(xdf, ydf, chrColumn = "chr", startColumn = "start",
     endColumn = "end", mem.limit=1e8)

_A_r_g_u_m_e_n_t_s:

     xdf: 'data.frame' that holds the first set of genomic regions

     ydf: 'data.frame' that holds the first set of genomic regions

chrColumn: character; what is the name of the column that holds the
          chromosome name of the regions in 'xdf' and 'ydf'

startColumn: character; what is the name of the column that holds the
          start position of the regions in 'xdf' and 'ydf'

endColumn: character; what is the name of the column that holds the
          start position of the regions in 'xdf' and 'ydf'

mem.limit: integer value; what is the maximal allowed size of matrices
          during the computation

_V_a_l_u_e:

     Originally, a matrix with 'nrow(xdf)' rows and 'nrow(ydf)'
     columns, in which entry 'X[i,j]' specifies the length of the
     overlap between region 'i' of the first list ('xdf') and region
     'j' of the second list ('ydf'). Since this matrix is very sparse,
     we use the 'dgCMatrix' representation from the 'Matrix' package
     for it.

_N_o_t_e:

     The function only return the absolute length of overlapping
     regions in base-pairs. It does not return the position of the
     overlap or the fraction of region 1 and/or region 2 that overlaps
     the other regions.

     The argument 'mem.limit' is not really a limit to used RAM, but
     rather the maximal size of matrices that should be allowed during
     the computation. If larger matrices would arise, the second
     regions list is split into parts and the overlap with the first
     list is computed for each part. During computation, matrices of
     size 'nrow(xdf)' times 'nrow(ydf)' are created.

_A_u_t_h_o_r(_s):

     Joern Toedling toedling@ebi.ac.uk

_S_e_e _A_l_s_o:

     'dgCMatrix-class'

_E_x_a_m_p_l_e_s:

       ## toy example:
       regionsH3ac <- data.frame(chr=c("chr1","chr7","chr8","chr1","chrX","chr8"), start=c(100,100,100,510,100,60), end=c(200, 200, 200,520,200,80))
       regionsH4ac <- data.frame(chr=c("chr1","chr2","chr7","chr8","chr9"),
     start=c(500,100,50,80,100), end=c(700, 200, 250, 120,200))

       ## compare the regions first by eye
       ##  which ones do overlap and by what amount?
       regionsH3ac
       regionsH4ac

       ## compare it to the result:
       as.matrix(regionOverlap(regionsH3ac, regionsH4ac))
       nonzero(regionOverlap(regionsH3ac, regionsH4ac))

