cellindexmaps           package:affxparser           R Documentation

_C_e_l_l _i_n_d_e_x _m_a_p_s _f_o_r _r_e_a_d_i_n_g _a_n_d _w_r_i_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     This part defines read and write maps that can be used to remap
     cell indices before reading and writing data from and to file,
     respectively.

     This package provides methods to create read and write
     (cell-index) maps from Affymetrix CDF files.  These can be used to
     store the cell data in an optimal order so that when data is read
     it is read in contiguous blocks, which is faster. Currently, this
     package does not provide methods for writing Affymetrix data files
     such as CEL files.  However, other package might do, which is why
     the methods here support read maps, e.g. 'readCelUnits'().

     For more details how cell indices are defined, see 'cellxy'.

_M_o_t_i_v_a_t_i_o_n:

     When reading data from file, it is faster to read the data in the
     order that it is stored compared with, say, in a random order. The
     main reason for this is that the read arm of the harddrive has to
     move more if data is not read consecutively.  Same applies when
     writing data to file.  The read and write cache of the file system
     may compensate a bit for this, but not completely.

     In Affymetrix CEL files, cell data is stored in order of cell
     indices. Moreover, (except for a few early chip types) Affymetrix
     randomizes the locations of the cells such that cells in the same
     unit (probeset) are scattered across the array. Thus, when reading
     CEL data arranged by units using for instance 'readCelUnits'(),
     the order of the cells requested is both random and scattered.

     Since CEL data is often queried unit by unit (except for some
     probe-level normalization methods), one can improve the speed of
     reading data by saving data such that cells in the same unit are
     stored together.  A _write map_ is used to remap cell indices to
     file indices.  When later reading that data back, a _read map_ is
     used to remp file indices to cell indices. Read and write maps are
     described next.

_D_e_f_i_n_i_t_i_o_n _o_f _r_e_a_d _a_n_d _w_r_i_t_e _m_a_p_s:

     Consider cell indices i=1, 2, ..., N*K and file indices j=1, 2,
     ..., N*K. A _read map_ is then a _bijective_ (one-to-one) function
     h() such that

                              i = h(j),

     and the corresponding _write map_ is the inverse function h^{-1}()
     such that

                            j = h^{-1}(i).

     Since the mapping is required to be bijective, it holds that i =
     h(h^{-1}(i)) and that j = h^{-1}(h(j)). For example, consider the
     "reversing" read map function h(j)=N*K-j+1.  The write map
     function is h^{-1}(i)=N*K-i+1. To verify the bijective property of
     this map, we see that h(h^{-1}(i)) = h(N*K-i+1) = N*K-(N*K-i+1)+1
     = i as well as h^{-1}(h(j)) = h^{-1}(N*K-j+1) = N*K-(N*K-j+1)+1 =
     j.

_R_e_a_d _a_n_d _w_r_i_t_e _m_a_p_s _i_n _R:

     In this package, read and write maps are represented as 'integer'
     'vector's of length N*K with _unique_ elements in {1,2,...,N*K}.
     Consider cell and file indices as in previous section.

     For example, the "reversing" read map in previous section can be
     represented as


          readMap <- (N*K):1

     Given a 'vector' 'j' of file indices, the cell indices are the
     obtained as 'i = readMap[j]'. The corresponding write map is


          writeMap <- (N*K):1

     and given a 'vector' 'i' of cell indices, the file indices are the
     obtained as 'j = writeMap[i]'.

     Note also that the bijective property holds for this mapping, that
     is 'i == readMap[writeMap[i]]' and 'i == writeMap[readMap[i]]' are
     both 'TRUE'.

     Because the mapping is bijective, the write map can be calculated
     from the read map by:


          writeMap <- order(readMap)

     and vice versa:


          readMap <- order(writeMap)

     Note, the 'invertMap'() method is much faster than 'order()'.

     Since most algorithms for Affymetrix data are based on probeset
     (unit) models, it is natural to read data unit by unit.  Thus, to
     optimize the speed, cells should be stored in contiguous blocks of
     units. The methods 'readCdfUnitsWriteMap'() can be used to
     generate a _write map_ from a CDF file such that if the units are
     read in order, 'readCelUnits'() will read the cells data in order.
     Example:


          Find any CDF file
          cdfFile <- findCdf()

          # Get the order of cell indices
          indices <- readCdfCellIndices(cdfFile)
          indices <- unlist(indices, use.names=FALSE)

          # Get an optimal write map for the CDF file
          writeMap <- readCdfUnitsWriteMap(cdfFile)

          # Get the read map
          readMap <- invertMap(writeMap)

          # Validate correctness
          indices2 <- readMap[indices]    # == 1, 2, 3, ..., N*K

     _Warning_, do not misunderstand this example.  It can not be used
     improve reading speed of default CEL files.  For this, the data in
     the CEL files has to be rearranged (by the corresponding write
     map).

_A_u_t_h_o_r(_s):

     Henrik Bengtsson (<URL: http://www.braju.com/R/>)

