This folder contains the tools used for making the .rda files contained in
this package from the dbSNP dump files.

dbSNP Home Page:

  http://www.ncbi.nlm.nih.gov/SNP/

Here is how these .rda files were made:

  1. Download all the ds_flat_ch*.flat.gz files from

       ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat

  2. Uncompress the downloaded files.
     These uncompressed files are the "source files".
     NB: The ASN.1 flatfile format (and many other formats used on
     the snp section of the FTP site) is described here:

       ftp://ftp.ncbi.nih.gov/snp/00readme.html

  3. Check the source files with for example:
       ./prechecking.sh path/to/ds_flat_ch16.flat

  4. Adjust settings in make_rdas.sh and run it.


Notes:
  Not all SNPs are consistent with hg19 genome i.e. the ambiguity letter for the SNP
  is not necessarily compatible with the nucleotide found at the SNP position.
  For example in 'ch1_snplocs.rda' 3084/1369185 SNPs are inconsistent with hg19 chr1.
  To get the number of inconsistent SNPs:

    library(SNPlocs.Hsapiens.dbSNP.20100427)
    ch1snps <- getSNPlocs("ch1")
    all_alleles <- paste(ch1snps$alleles_as_ambig, collapse="")
    library(BSgenome.Hsapiens.UCSC.hg19)
    neditAt(all_alleles, unmasked(Hsapiens$chr1)[ch1snps$loc], fixed=FALSE)

  The case of chrM is of course hopeless since GRCh37 and hg19 use a different
  sequence for this chromosome.

