This document needs to outline a roadmap for the future of
AnnotationHub and AnnotationHubServer.  Specifically: What resources
should be included?  What is the expected time when those resources
should be made available?  And how should we justify the inclusion of
some resources and not others?  

Before adressing the 1st two questions, lets consider the last one.
What considerations should be given to a resource to merit it's
inclusion?  A big part of this is how can we discriminate annotations
from mere data sets?  For while many annotations may start as data
sets, not all data sets eventually become annotations.


1) The annotations should be widely used.

2) The annotations should come from respectable sources.

3) The annotations should be considered valuable by the community.

4) The annotations should have input from multiple different resources.









What resources should be included?

#############
## Huge resources from national centers:

* NCBI: (gene stuff is well supported already, but MANY other kinds of
  data are not)

http://www.ncbi.nlm.nih.gov/guide/all/#downloads_

For other data: possibility of communicating with Eric Sayers Ph.D. to see about
modernizing e-utils?  Does Sean know him?
(OR we can just raid their FTP sites as usual)


For existing NCBI annots: just make our Objs available in AnnotationHub.

Unsupported Annotations:
1000 Genomes (in progress)
ENCODE (in progress?)
ATCC
CDD
dbVar, dbGaP, dbSNP
PopSet, HomoloGene, Protein clusters
OMIM, OMIA
Taxonomy 
(and more)



* ensembl:
(FTP files are in progress.  Most gene oriented stuff is already in biomaRt)

http://uswest.ensembl.org/info/data/ftp/index.html





#############
## Very popular specialized community resources (most of these would
## be covered if we just make packages available).

* UCSC: Contains many tracks that we should probably expose (as GRanges)

* KEGG: Available again as KEGGREST
  
* Reactome: Available as a Database object (but much room for improvement)

* GO: Available as a Database object

* Uniprot: Available as a web service





#############
## Support for more specialized centers?:

* Tair: Available as a OrgDb object

* SGD: Available as a OrgDb object

* Blast2Go: Available as a mappings only (should probably be expanded)

* Allen Brain data: Could make a web service (they claim a restful interface)

* PlasmoDB:  Partially available as a OrgDb object.

* PANTHER:  Might be made available by volunteer?








################################
## Ideas for future deployment:

## GOALS:

1) We want to accomodate a vastly diverse set of annotation types, and
present them to users in a form that is useful.

2) We want to bring our annotation resources (and indeed the resources
that the world has to offer) under one umbrella and have them
accessible to R/Bioc.

3) We want to do this in a way that is easy to maintain over the long
term.  This is tricky because the annotations are rapidly changing,
with new kinds showing up all the time.  Also things are integrating
with each other in new an often unpredictable ways. 

4) We want to make it easy for users to combine different annotations together.

5) We want to make it easy for users to work with these different
annotations (the format they are presented in should seem intuitive).
Sometimes this may mean that some resources may be present in more
than one format.

6) We want to make it easy to add new resources into the repos.  We
want the process to be simple enough that users can contribute in a
semi-automatic way.  The community is the reason why the software
branch of this project is a success, and it's crucial to extend this
philosphy here.  In fact this may be the single most important thing
to get right.  




## One plan:

1) Roll out current Anntotation packages as .sqlite files.  The Hub
can be used to pull those down (caching them for future use) and can
execute the call to spawn the appropriate object (OrgDb object etc.)
and Hand that back to the user.  .Rdas can be used for Org.db objects
(and code can check to pre-download dependencies for these).  I do NOT
think we should support classic mappings in this way.  Just the newer
selectable objects.

2) Long term (very long term as this is a very ambitious goal) some
DBs are just too large to be downloaded as chunks (Reactome comes to
mind).  So for these it might make sense to have a way of calling
select() on the AnnotationHub object itself (with appropriate keys
etc.)  This change would be sticky though!  Because we currently have
keys and keytypes methods defined for this object (and they return
things that are conceptually the same as other keys/keytypes methods,
but they will actually be different values from what we want to have
happen if we want to do: select(AnnHub,k,c,kt).  Currently,
keys/keytypes returns values that control metadata, and what we need
for select(AnnHub,k,c,kt) to work are keys and keytypes that relate to
things you can get back from the DBs in the background.  ALSO: there
are serious questions about how smart it is to define
keys/keytypes/cols when you will have so many kinds of data
represented. This brings me to:

3) If we want to do something like #2, select and its supporting
methods will have to respect filters().  Otherwise there will just be
too many keytypes etc.  So one important filtertype is the TaxonomyId.
Without that being set, it's going to be really hard to manage all the
data.

4) Another implication of #2 is that we would need to be able to
integrate all the resources together behind the scenes.  There are
multiple ways we could do this.  I don't think that OrganismDbi is
necessarily the ultimate answer, but it would allow a quick way to
"get there" for resources that have a select interface already.


## Another plan (I presently like this one better):

1) Roll out current Annotation packages as .sqlite files (same as #1
from other plan).

2) Don't try so hard to reinvent the world of annotations.  Keep
different kinds of annotations modularized so that they can be easily
added/removed.  For some things a select interface will make sense and
that object can be returned, but for other things we won't want that.

3) Re-make the code that generates things like the org packages as
into recipe format (all the code should live in an R package).
Currently a lot of it doesn't, and that's not good.

4) Add new recipes for many other kinds of annotation resources
(depending on what we are talking about).  Code that can make a
database from a set of text files could be generalized.

5) Alternatively for web services, AH could just give the user an
object (installing or informing about any dependencies?) so that they
could access the web service.

6) OrganismDbi would still work as it presently does to connect
several resources together (when appropriate foreign keys are
defined).




#######################################################################
#######################################################################
## Some low hanging fruit
## Things that we should try to add next
## (Avoid things that are existing resources - for now)




## Ensembl regulation? (as annotated GRanges?) 0r maybe even as TxDb?

## UCSC tracks? (as Annotated GRangs)

## And some "hot" stuff

## 1000 Genomes? (maybe just get this via UCSC?)
## (There are a couple tracks already)

## Encode?  (maybe also via UCSC?)
## http://genome.ucsc.edu/ENCODE/downloads.html


## And CDD (Because I worry that we have neglected making it easy for
   users to get this data at the sequence level and not just the gene
   level) 
