Athaliana  package:BSgenome.Athaliana.TAIR.04232008  R Documentation

_A_r_a_b_i_d_o_p_s_i_s _t_h_a_l_i_a_n_a _f_u_l_l _g_e_n_o_m_e (_T_A_I_R _v_e_r_s_i_o_n _f_r_o_m _A_p_r_i_l _2_3, _2_0_0_8)

_D_e_s_c_r_i_p_t_i_o_n:

     Arabidopsis thaliana full genome as provided by TAIR (snapshot
     from April 23, 2008) and stored in Biostrings objects.

_N_o_t_e:

     This BSgenome data package was made from the following source data
     files:


     all the chr*.fas file from
     ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/
     WARNING: This is where things are today (Oct 1st, 2008) but is
     probably
     NOT aimed to be the permanent URL for the 04232008 snapshot of the
     genome.
     TAIR might update the content of this folder in the future with a
     new
     snapshot and move the 04232008 snapshot to the OLD/ subfolder.

     See '?BSgenomeForge' and the BSgenomeForge vignette
     ('vignette("BSgenomeForge")') in the BSgenome software package for
     how to make a BSgenome data package.

_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     BSgenome-class, DNAString-class, 'available.genomes',
     BSgenomeForge

_E_x_a_m_p_l_e_s:

     Athaliana
     seqlengths(Athaliana)
     Athaliana$chr1  # same as Athaliana[["chr1"]]

     if ("AGAPS" %in% masknames(Athaliana)) {

       ## Check that the assembly gaps contain only Ns:
       checkOnlyNsInGaps <- function(seq)
       {
         ## Replace all masks by the inverted AGAPS mask
         masks(seq) <- gaps(masks(seq)["AGAPS"])
         af <- alphabetFrequency(seq)
         found_letters <- names(af)[af != 0]
         if (any(found_letters != "N"))
             stop("assembly gaps contain more than just Ns")
       }

       ## A message will be printed each time a sequence is removed
       ## from the cache:
       options(verbose=TRUE)

       for (seqname in seqnames(Athaliana)) {
         cat("Checking sequence", seqname, "... ")
         seq <- Athaliana[[seqname]]
         checkOnlyNsInGaps(seq)
         cat("OK\n")
       }
     }

     ## See the GenomeSearching vignette in the BSgenome software
     ## package for some examples of genome-wide motif searching using
     ## Biostrings and the BSgenome data packages:
     if (interactive())
         vignette("GenomeSearching", package="BSgenome")

