This folder contains the tools used for making the .rda files contained in
this package from the dbSNP dump files.

Homepage for the dbSNP database:

  http://www.ncbi.nlm.nih.gov/SNP/

Here is how these .rda files were made:

  1. Download all the ds_flat_ch*.flat.gz files from

       ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat

  2. Uncompress the downloaded files.
     These uncompressed files are the "source files".
     NB: The ASN.1 flatfile format (and many other formats used on
     the snp section of the FTP site) is described here:

       ftp://ftp.ncbi.nih.gov/snp/00readme.html

  3. Check the source files with for example:
       ./prechecking.sh path/to/ds_flat_ch16.flat

  4. Adjust settings in make_rdas.sh and run it.


Notes:
  Not all SNPs are consistent with hg18 genome i.e. the ambiguity letter for the SNP
  is not necessarily compatible with the nucleotide found at the SNP position.
  For example in 'chr1_snplocs.rda' 194/694706 SNPs are inconsistent with hg18 chr1.
  To get the list of inconsistent SNPs:
    chr1_bad <- mismatch(paste(chr1_snplocs$alleles_as_ambig, collapse=""), BStringViews(chr1[chr1_snplocs$loc]), fixed=FALSE)[[1]]
    length(chr1_bad)
    chr1_snplocs[chr1_bad, ]

