readCelUnits           package:affxparser           R Documentation

_R_e_a_d_s _p_r_o_b_e-_l_e_v_e_l _d_a_t_a _o_r_d_e_r_e_d _a_s _u_n_i_t_s (_p_r_o_b_e_s_e_t_s) _f_r_o_m _o_n_e _o_r _s_e_v_e_r_a_l _A_f_f_y_m_e_t_r_i_x _C_E_L _f_i_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Reads probe-level data ordered as units (probesets) from one or
     several Affymetrix CEL files by using the unit and group
     definitions in the corresponding Affymetrix CDF file.

_U_s_a_g_e:

     readCelUnits(filenames, units=NULL, ..., transforms=NULL, cdf=NULL, stratifyBy=c("nothing", "pmmm", "pm", "mm"), addDimnames=FALSE, readMap=NULL, dropArrayDim=TRUE, reorder=TRUE, verbose=FALSE)

_A_r_g_u_m_e_n_t_s:

filenames: The filenames of the CEL files.

   units: An 'integer' 'vector' of unit indices specifying which units
          to be read.  If 'NULL', all units are read.

     ...: Arguments passed to low-level method 'readCel', e.g. 'readXY'
          and 'readStdvs'.

transforms: A 'list' of exactly 'length(filenames)' 'function's.  If
          'NULL', no transformation is performed. Intensities read are
          passed through the corresponding transform function before
          being returned.

     cdf: A 'character' filename of a CDF file, or a CDF 'list'
          structure.  If 'NULL', the CDF file is searched for by
          'findCdf'() first starting from the current directory and
          then from the directory where the first CEL file is.

stratifyBy: Argument passed to low-level method 'readCdfUnits'.

addDimnames: If 'TRUE', dimension names are added to arrays, otherwise
          not.  The size of the returned CEL structure in bytes
          increases by 30-40% with dimension names.

 readMap: A 'vector' remapping cell indices to file indices. If 'NULL',
          no mapping is used.

 reorder: If 'TRUE', cell indices are read in order to speed up the
          reading.  If 'FALSE', cells are read in the order as given. 
          For more details, see help on the same argument in
          'readCel'().

dropArrayDim: If 'TRUE' and only one array is read, the elements of the
          group field do _not_ have an array dimension.

 verbose: Either a 'logical', a 'numeric', or a 'Verbose' object
          specifying how much verbose/debug information is written to
          standard output. If a Verbose object, how detailed the
          information is is specified by the threshold level of the
          object. If a numeric, the value is used to set the threshold
          of a new Verbose object. If 'TRUE', the threshold is set to
          -1 (minimal). If 'FALSE', no output is written (and neither
          is the 'R.utils' package required). 

_V_a_l_u_e:

     A named 'list' with one element for each unit read.  The names
     corresponds to the names of the units read. Each unit element is
     in turn a 'list' structure with groups (aka blocks). Each group
     contains requested fields, e.g. 'intensities', 'stdvs', and
     'pixels'. If more than one CEL file is read, an extra dimension is
     added to each of the fields corresponding, which can be used to
     subset by CEL file.

     Note that neither CEL headers nor information about outliers and
     masked cells are returned.  To access these, use 'readCelHeader'()
     and 'readCel'().

_A_u_t_h_o_r(_s):

     Henrik Bengtsson (<URL: http://www.braju.com/R/>)

_R_e_f_e_r_e_n_c_e_s:

     [1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats,
     June 14, 2005. <URL: http://www.affymetrix.com/support/developer/>

_S_e_e _A_l_s_o:

     Internally, 'readCelHeader'(), 'readCdfUnits'() and 'readCel'()
     are used.

_E_x_a_m_p_l_e_s:

     for (zzz in 0) {

     # Scan current directory for CEL files
     files <- list.files(pattern="[.](c|C)(e|E)(l|L)$")
     files <- files[!file.info(files)$isdir]
     if (length(files) == 0)
       break

     # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     # Benchmarking reading cells in order or not.
     #
     # The difference will be large the more files that are read.
     # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     units <- 56:256
     nunits <- length(units);

     # Make sure enough files are read to measure the speed up
     files <- rep(files, length.out=5)
     nfiles <- length(files);

     t1 <- system.time({
       cel <- readCelUnits(files, units=units, reorder=TRUE)
     })[3]
     cat(sprintf("Time   [ordered]: %6.2fs = %.2fms/(unit & array) [1.00x]\n",
                                               t1, 1000*t1/nunits/nfiles))
     rm(cel); gc()

     t2 <- system.time({
       cel <- readCelUnits(files, units=units, reorder=FALSE)
     })[3]
     cat(sprintf("Time [unordered]: %6.2fs = %.2fms/(unit & array) [%.2fx]\n",
                                        t2, 1000*t2/nunits/nfiles, t2/t1))
     rm(cel); gc()

     # Clean up
     rm(files, t1, t2)

     } # for (zzz in 0)

