mergeSAGE             package:SAGElyzer             R Documentation

_F_u_n_c_t_i_o_n_s _t_o _m_e_r_g_e _S_A_G_E _l_i_b_r_a_r_i_e_s _b_a_s_e_d _o_n _u_n_i_q_u_e _S_A_G_E _t_a_g_s

_D_e_s_c_r_i_p_t_i_o_n:

     These functions merge individual SAGE libraries based on unique
     SAGE tags and write the merged data into a file and a table in a
     database with the unique SAGE tags as one column and counts from
     all the libraries as the others.

_U_s_a_g_e:

     mergeSAGE(libNames, isDir = TRUE,  skip = 1, pattern = ".sage")
     getLibInfo(fileNames)
     calNormFact(normalize = c("min", "max"), libNNum)
     getLibNNum(fileNames)
     getUniqTags(fileNames, skip = 1, sep = "\t")
     writeSAGE4Win(fileNames, uniqTags, infoData, pace = 1000)
     mapFile2Tag(fileNames, tags, skip, n)
     writeSAGECounts(fileNames, uniqTags, skip, sep = "\t")
     writeSAGE2DB(dbArgs, colNames, keys, numCols, fileName, what =
     c("counts", "map", "info"), charNum = 20, type = "int4")
     getColSQL(colNames, charNum, keys, numCols, type)
     writeSAGE4Unix(countData, infoData)

_A_r_g_u_m_e_n_t_s:

libNames: 'libNames' - a vector of character strings for the name of
          the SAGE libraries to be merged. 'libNames' can be the name
          of the directory containing SAGE libraries to be merged

   isDir: 'isDir' - a boolean that is TRUE if libNames is the name for
          the directory that contains SAGE libraries to be merged

    skip: 'skip' - an integer for the number of lines to be skiped when
          the libraries are merged

 pattern: 'pattern' - a character string for the pattern to be used to
          get the file SAGE data files from the directory when
          'libNames' is for a directory. Only files that match the
          pattern will be merged

fileNames: 'fileNames' a vector of character strings for SAGE libraries
          to be writtern to DB or used for analysis

normalize: 'normalize' a character string given the name of a function
          for normalization

 libNNum: 'LibNNum' a matrix with columns for SAGE library names and
          maximum and minimun number of counts

uniqTags: 'uniqTags' a vecter of character string for the unique SAGE
          tags

infoData: 'inforData' a matrix containing SAGE library information data

    pace: 'pace' an integer for the maximun number of SGAE tags to be
          processed each run when writing SAGE library data to database
          under Windows

    tags: 'tags' a vecter of character string of SAGE tags

       n: 'n' an integer for the number of neighbors defined for KNN

     sep: 'sep' a character string for the separator used

  dbArgs: 'dbArgs' a list containing arguments for making conntions

colNames: 'colNames' a vector of character strings for the names of
          columns of a matrix

    keys: 'keys' a vector of character strings for the names of key
          columns of a database

 numCols: 'numCols' see 'ncol'

fileName: 'fileName' acharacter string for the name of a file to be
          used to populate a database

    what: 'what' a character string that can be either 'counts', 'map',
          or 'info' to indicate what SAGE data to deal with

 charNum: 'charNum' an integer indicating the number of characters for
          the length of character columns in a database

    type: 'type' a character string for the data type of a database
          column

countData: 'countData' a matrix containing tag counts for SAGE
          libraries

_D_e_t_a_i_l_s:

     Each SAGE library typically contains two columns with the first
     one being SAGE tags and the second one being their counts.
     'mergeSAGE' merges library files based on the tags. Tags that are
     missing from a given library but exist in other will be assigned
     0s for the library. 

     'mergeSAGE' will generate two files. One contains the merged data
     and the other contains four columns with the first one being the
     column names of the database table to store the SAGE counts, the
     second one being the original SAGE library names, the third being
     the normalization factor that will be used to normalize counts
     based on the library with the smallest number of tags, and the
     forth being the factor based on the library with the largest
     number of tag.

     'getLibInfo' creates the file that contains the information about
     the data file.

     'calNormFact' calculates the normalization factor.

_V_a_l_u_e:

     'mergeSAGE' returns a list containing two file names 

    data: a character string for the name of the file containing the
          merged data

    info: a character string for the name of the file containing
          information about the merged data


     'getLibInfo' returns a matrix with four columns.

_A_u_t_h_o_r(_s):

     Jianhua Zhang

_R_e_f_e_r_e_n_c_e_s:

     <URL: http://www.ncbi.nlm.nih.gov/geo>

_S_e_e _A_l_s_o:

     'SAGELyzer'

_E_x_a_m_p_l_e_s:

     path <- tempdir()
     # Create two libraries
     lib1 <- cbind(paste("tag", 1:10, sep = ""), 1:10)
     lib2 <- cbind(paste("tag", 5:9, sep = ""), 15:19)
     write.table(lib1, file = file.path(path, "lib1.sage"), sep = "\t",
     row.names = FALSE, col.names = FALSE)
     write.table(lib2, file = file.path(path, "lib2.sage"), sep = "\t",
     row.names = FALSE, col.names = FALSE) 
     libNNum <- getLibNNum(c(file.path(path, "lib1.sage"),
     file.path(path, "lib2.sage")))
     normFact <- calNormFact("min", libNNum)
     uniqTag <- getUniqTags(c(file.path(path, "lib1.sage"),
     file.path(path, "lib2.sage")), skip = 0)

