lumiHumanIDMapping_nuID  package:lumiHumanIDMapping  R Documentation

_M_a_p_p_i_n_g _n_u_I_D_s _o_f _I_l_l_u_m_i_n_a _H_u_m_a_n _c_h_i_p_s _t_o _t_h_e _m_o_s_t _r_e_c_e_n_t _H_o_m_o _s_a_p_i_e_n_s _R_e_f_S_e_q _r_e_l_e_a_s_e

_D_e_s_c_r_i_p_t_i_o_n:

     We mapped nuIDs of Illumina Human chips by BLASTing each probe
     sequence (converted from nuID) against the the most recent Homo
     sapiens RefSeq release. The mapping also includes the mapping
     quality information, like mapping strength, uniqueness, number of
     hits.

_U_s_a_g_e:

       lumiHumanIDMapping_nuID()

_D_e_t_a_i_l_s:

     The nuID mapping information is kept in the nuID_MappingInfo table
     in the ID Mapping library. The nuID mapping table includes
     following fields (columns):

     1.      nuID: nuID for the probe sequence

     2.      Strength1: Strength of the best hit. This is measured as
     the longest contig between the probe and the hit sequence plus the
     number of bases of identity between the two sequences, divided by
     the total probe length, normalized to 100 for a perfect identical
     match.

     3.      Strength2: Strength of the second best gene hit. We are
     mapping to Entrez gene ids as multiple RefSeq accessions may have
     the same Entrez gene accession, reflecting differing splice sites,
     conflicting gene model evidence, or unresolved curation. 

     4.      Uniqueness: (Strength1-Strength2)/Strength1*100.

     5.  Total hits: Total number of gene models (Entrez gene records)
     hit by the probe with at least 17 nucleotides

     6.  Accession: RefSeq gene model Accession number

     7.  EntrezID: The Entrez Gene ID corresponding to RefSeq Accession
     number shown in field "Accession"

     8.  Accession2: RefSeq gene model Accession number for the best
     hit for the second best gene model (Entrez gene model)

     Procedures of nuID mappings:

     Briefly, we BLASTed each probe sequence (converted from nuID)
     against the corresponding RefSeq genome. Then we processed the
     resulting BLAST run files and identified all hits to a probe
     sequence that have at least a contiguous hit of 17 nucleotides (17
     is generally accepted as a minimum number of contiguous bases
     required to get a hybridization signal with oligo arrays). We have
     found that many of the RefSeq models map to the same Entrez gene,
     so we treat those as single hits and take the best hit defined by
     expectation value to that model using that probe. We then
     summarize the total number of Entrez genes hit by a probe, and
     list the best RefSeq model accession number (if any) and 2nd best
     RefSeq model accession number (if any) that is to a second Entrez
     gene. We then score the best hit by giving it a strength, which is
     the length of the matched sequence plus the length of the longest
     contiguous sequence in the hit, divided by the total length of the
     probe sequence and then multiply by 50, giving a strength score
     that runs from 0-100. This procedure is done for the second best
     gene model hit as well. A uniqueness score is then calculated, and
     it is simply strength of the best hit against the first gene model
     minus the strength of the best hit against the second gene model
     (in most cases there is not a second model, so this is zero), and
     this number divided by the strength of the first hit and then
     multiplied by 100, to again give a number from 0-100. We
     anticipate that most groups will be interested in only using
     probes for which the strength of the best model and the uniqueness
     score are both 95 or above. For more details, please visit website
     at: https://prod.bioinformatics.northwestern.edu/nuID/

_V_a_l_u_e:

     'lumiHumanIDMapping_nuID' returns a nuID mapping summary of
     Illumina Human chips.

_R_e_f_e_r_e_n_c_e_s:

     1. https://prod.bioinformatics.northwestern.edu/nuID/

     2. Du, P., Kibbe, W.A. and Lin, S.M., "nuID: A universal naming
     schema of oligonucleotides for Illumina, Affymetrix, and other
     microarrays", Biology Direct 2007, 2:16 (31May2007).

_E_x_a_m_p_l_e_s:

       ## List the fields in the nuID_MappingInfo table
       conn <- lumiHumanIDMapping_dbconn()
       dbListFields(conn, 'nuID_MappingInfo')

       ## Summary of nuID mapping
       lumiHumanIDMapping_nuID()

