CHANGES IN VERSION 1.0.1
--------------------------

BUG FIXES

    o Ensembl BioMart has changed the column 'external_gene_id' to
    'external_gene_name'. Renamed my biomaRt queries accordingly to prevent
    GO_analyse() from crashing during the analysis step.

    o Updated the AlvMac_results variable containing sample results annotated
    with the deprecated 'external_gene_id'. It now includes the new
    'external_gene_name'.

GENERAL UPDATES

    o DESCRIPTION file minor correction.

    o User Guide's chunk about installation using biocLite is not evaluated
    anymore, as advised by Dan Tenenbaum.

CHANGES IN VERSION 0.99.17
--------------------------

BUG FIXES

    o NAs were introduced in the average score and rank of GO terms following
    analysis of microarray data (ANOVA and randomForest) for GO terms without
    associated features in the ExpressionSet. The problem was not found in any
    case for analysis of RNA-Seq data. This was causing issues during the
    subset_scores function and subsequent plots, such as main titles made of
    multiple lines and NAs. GO terms without associated probesets are now
    given average score of 0 and average rank equal to the maximum in the
    dataset plus 1.

CHANGES IN VERSION 0.99.16
--------------------------

GENERAL UPDATES

    o Included co-authors who participated in the generation and analysis of
    data used to test the package.

CHANGES IN VERSION 0.99.15
--------------------------

GENERAL UPDATES

    o Updated man pages GO_analyse to describe the recently added subset
    slot in the result variable.

CHANGES IN VERSION 0.99.14
--------------------------

BUG FIXES

    o The anova method of the GO_analyse method was broken since the
    introduction of ExpressionSet, release 0.99.4.

UPDATED FEATURES

    o The subset argument was also added to the GO_analyse method. This allows
    the identification of genes clustering a subset of groups, at a subset of
    time-points, etc. while plotting the expression profile of the entire
    dataset, if desired.

GENERAL UPDATES

    o Updated man pages GO_analyse to allow only one example to be run. This
    saves time during CMD check, while making the man page more easily
    readable.

CHANGES IN VERSION 0.99.13
--------------------------

UPDATED FEATURES

    o All expression plots were updated to a default ylim range corresponding
    to the minimum and maximum expression values found in the entire 
    ExpressionSet. This is meant to avoid mis-interpretation of the amplitude
    of variation between sample groups, as suggested in the paper:
    Rougier, N.P., Droettboom, M., and Bourne, P.E. (2014). Ten simple rules
    for better figures. PLoS computational biology 10, e1003833.

CHANGES IN VERSION 0.99.12
--------------------------

UPDATED FEATURES

    o All expression plots were given new arguments for more control: xlab
    allows users to change the default title for the X-axis, ylim allows users
    to override the lower and upper boundaries of the Y axis.

GENERAL UPDATES

    o Updated man pages AlvMac, AlvMac_results, microarray2dataset, and
    prefix2dataset. Replaced the LaTeX describe statement by an itemise
    statement to make the man page more readable in a terminal window.

    o Moved functions between the post_analysis script and the toolkit
    script. From now on, only functions accessible to the users should be
    present in the post_analysis script, while toolkit should contain
    method called internally.

    o The User's Guide was updated to redirect the users to the new support
    site for Bioconductor rather than the bioc-devel mailing list.

CHANGES IN VERSION 0.99.11
--------------------------

NEW FEATURES

    o The new subEset() method subsets an ExpressionSet given a list where
    item names are column names of the phenoData slots and item values are
    vectors of values corresponding to sample to retain (e.g.
    list(Time=c("2H", "6H")) will retain samples with value "2H" or "6H" in
    the "Time" column of the phenoData slot).

UPDATED FEATURES

    o All relevant visualisation methods have been added an argument to
    subset samples to plot to those with a given set of values for a given
    column in the phenoData (uses the new function subEset described above).

    o The example analysis results was renamed from "raw_results" to
    "AlvMac_results", so that it can now loaded running data(AlvMac_results).
    The new name is meant to be more specific to the package.

CHANGES IN VERSION 0.99.10
--------------------------

GENERAL UPDATES

    o Replaced \format by \value sections to the following man pages:
    AlvMac.Rd, microarray2dataset.Rd, prefix2dataset.Rd, raw_results.Rd
    to avoid having an empty value section, as recommended by the Bioconductor
    package tracker. Hopefully, it does not require a non-empty \format
    section as well...

    o Restricted lines to less than 80 characters and indentation by multiple
    of four space characters.


CHANGES IN VERSION 0.99.9
-------------------------

UPDATED FEATURES

    o all post analysis functions were given more sanity checks to verify
    that the "result" argument contains the required slots of a GO_analyse()
    output and the the arguments pointing at a phenotypic data column are
    valid column names.

BUG FIXES

    o expression_plot_symbol and expression_profiles symbol used the default
    value of col.palette and colourF instead of forwarding the user-defined
    one to the expression_plot and expression_profiles functions.

GENERAL UPDATES

    o Added cross-reference in UsersGuide to point at examples of usages of
    factor and numeric values for the expression plots


CHANGES IN VERSION 0.99.8
-------------------------

GENERAL UPDATES

    o Resaved R data files to reduced package disk size. No more WARNING in
    R CMD check.

GENERAL UPDATES

    o Removed reference to GitHub in the README file. The weblink is given
    in the DESCRIPTION file anyway if users are interested.


CHANGES IN VERSION 0.99.7
--------------------------

UPDATED FEATURES

    o fixed a typo in the code of heatmap_GO which made it crash for any other
    dataset than the example dataset.

CHANGES IN VERSION 0.99.6
-------------------------

UPDATED FEATURES

    o expression_profiles_symbol() method was missing the "index" argument to select the feature identifier to plot alone.


CHANGES IN VERSION 0.99.5
-------------------------

NEW FEATURES

    o expression_profiles() method plots the expression profiles of individual
    samples series, as opposed to grouped samples series handled by 
    expression_plot functions.

    o expression_profiles_symbol() method plots the expression profiles of
    individual samples series using a gene name instead of an Ensembl gene
    identifier.

UPDATED FEATURES
    
    o overlap_GO can print to screen, if filename argument is set to NULL 
    (Default).

    o heatmap_GO, cluter_GO and plot_design can resize title font and wrap the
    text on multiple lines.

    o expression_plot and expression_plot_symbol can orient X axis labels at
    a given angle.

    o replaced return(NULL) statement by stop() when no close match is found
    to a gene name in the family of expression_plot functions.

GENERAL UPDATES

    o User's Guide updated.

    o List of contributors updated in User's Guide and DESCRIPTION.


CHANGES IN VERSION 0.99.4
-------------------------

UPDATED FEATURES
    
    o Use of ExpressionSet instead of numeric named matrix and
    AnnotatedDataFrame. Better consistency with other Bioconductor packages.

GENERAL UPDATES

    o Implemented corrections requested following the Bioconductor review.
    Includes typos, consistent terminology through the package code and
    metafiles, additional information in help files, no reference to GitHub
    as an alternate installation option, use of arrow signs instead of equal
    signs for value assignment.
    
    o Restricted lines to 80 characters, and used 4-space tabulations.

    o Corrected out-of-date documentation.


CHANGES IN VERSION 0.99.3
-------------------------

UPDATED FEATURES
    
    o Control the size of the legend text in the two expression plot figures.
    Updated help files accordingly.
    
    o Updated vignette with new section "Statistics".
    
    o Complete cleaning of code files for lines shorter than 80 columns.
    
    o Cleanup of help files for lines shorter than 80 columns.
    
    o Enabled filtering of raw results on the average score of a GO term.

CHANGES IN VERSION 0.99.2
-------------------------

UPDATED FEATURES
    
    o Metadata lines in the preamble of the Sweave file


CHANGES IN VERSION 0.99.1:

UPDATED FEATURES
    
    o Sweave vignette implemented.
    
    o Replaced all message() statements by cat() to make Sweave
    output the full message in the vignette.
    
    o Updated a missed F into FALSE
    
    o Updated an invalid biocViews (typo)
    
    o Date field added for a proper citation() method.

CHANGES IN VERSION 0.99.0
-------------------------

UPDATED FEATURES
    
    o Replaced all cat() statements by message() to match the
    Biocondcutor guidelines.


CHANGES IN VERSION 0.6.2
------------------------

UPDATED FEATURES
    
    o Fixed typo in the example of the vignette of GO_anova().
    
    o Advice to use the subset_scores() function in the GO_anova()
    vignette.
    
    o Helpful warning message in the vignette of GO_anova() to make
    sure that the experimental factor to analyse is actually formatted
    as a factor in the true R language meaning.
    
    o Updated license to GPL (>= 3) instead of GPL-2.


CHANGES IN VERSION 0.6.1
------------------------

NEW FEATURES
    
    o GOexpress now implements the randomForest framework to 
    estimate the importance of each gene on the clustering of the samples
    according to the different levels of the desired factor (See section
    UPDATED FEATURES below for more details).
    
    o rerank() function return a re-ordered version of the result
    variable given, ordered using either the rank or the score metrics.
    
    o Sample data raw_result was added to provide an example output
    of the analysis, and also allow to test the visualisation functions
    on a readily available variable instead of having to run the analysis
    function everytime (would have killed the R CMD check duration).
    
UPDATED FEATURES
    
    o GO_analyse() default has been changed to a random forest 
    statistical framework. This framework has various advantages over the
    previous ANOVA approach; namely 1) the bootstrapping and re-sampling
    of gene subsets to measure the importance of each gene in predicting
    the desired factor over many iterations (Default: 1,000 trees), 2)
    a shorter analysis duration due to the Fortran code underlying the
    randomForest package, 3) randomForest does not make assumptions on the
    distribution of the expression level within and between genes, 4)
    randomForest deals implicitely with interactions in multifactorial
    experiments. 
    
    o Genes are now initially ranked according to their rank and GO terms
    according to the average rank of their associated genes. Genes
    associated with a GO term but absent from the dataset are assigned a
    value of max(rank)+1 and a score of 0. Ties between genes are resolved
    by giving all genes the minimal rank, the next gene being given
    rank+length(tied_genes). The average_rank metric is expected to be
    more robust than the average "score". Depending on the statistical
    framework used, "score" may mean "importance" (randomForest) or
    "F.value" (ANOVA).
    
    o More sanity checks at the start of some methods to ensure smooth
    execution of the downstream code and more helpful error messages.
    
    o More synonyms allowed for subsetting and filtering of the result
    variable.
    
    o list_genes() and table_genes() methods now allow to chose whether
    to return all feature identifiers associated with a GO term, or only
    those also present in the expression dataset.


CHANGES IN VERSION 0.5.5
------------------------

UPDATED FEATURES
    
    o heatmap_GO() default colorscale has been changed to
    blue-white-red better suited to the representation of expression
    level. The previous colorscale green-black-red is now suggested if
    differential expression data is used (e.g. log2FC).


CHANGES IN VERSION 0.5.4
------------------------

NEW FEATURES
    
    o Sample data accessible with the data() method.


CHANGES IN VERSION 0.5.3
------------------------

NEW FEATURES
    
    o Method overlap_GO() produces a Venn diagram showing the overlap
    of gene sets associated with two to five GO terms. The Venn diagram
    is saved in a TIFF image file.


CHANGES IN VERSION 0.5.2
------------------------

UPDATED FEATURES
    
    o Method heatmap_GO() updated to color-code the samples by levels
    of a factor, and a better colormap is used to represent the level of
    expression of genes.


CHANGES IN VERSION 0.5.1
------------------------

UPDATED FEATURES
    
    o Package renamed GOexpress after verifying that no other package
    on Bioconductor uses this name.
    
    o Method GO_anova() renamed to GO_analyse() as other metrics than
    ANOVA may be implemented in the future.


CHANGES IN VERSION 0.4.1
------------------------

NEW FEATURES
    
    o Supports microarray probeset identifiers as gene identifiers
    in the expression dataset.
    
    o Argument "result=" is no more optional in all the post-analysis
    functions.
    
UPDATED FEATURES
    
    o expression_ploto() functions now dynamically adapt the colormap
    to the number of groups of samples instead of three hard-coded colors
    used for the colormap.


CHANGES IN VERSION 0.3.1
------------------------

NEW FEATURES
    
    o expression_plot() plots the expression profile of the gene
    corresponding to an Ensembl identifier, given valid variable name
    for the X-axis and a grouping factor for the Y-axis.
    
    o expression_plot_symbol() plots the expression profile of the
    gene(s) with the Ensembl identifier(s) corresponding to a gene
    symbol, given valid variable name for the X-axis and a grouping
    factor for the Y-axis.
    
    o plot_design() plots the univariate effect of each level of each
    factor available in the AnnotatedDataFrame on the expression levels
    of genes associated with a GO term.
    
UPDATED FEATURES
    
    o The scoring function now uses the total count of genes associated
    with a GOterm instead of the count of genes in the dataset and 
    associated with the GO term. Genes absent from the expression dataset
    are assigned a F.value of 0, similarly to the genes with a non-
    significant ANOVA result. This was not found to significantly impact
    the ranking of GO terms, but is a more objective scoring of the
    GO terms if the expression data was generated through objective
    filtering (e.g. filtering out lowly expressed genes).
    
    o subset_scores() now also filters the $mapping and $anova slots
    to retain only mappings and gene information related to the GO terms
    left in the $scores slot after filtering. This reduces significantly
    the memory space of the filtered variable, while the object containing
    the raw results of GO_anova() will still contain the full information
    fetched from the Ensembl BioMart server if the user wants it.
    
    o ggplot2 and grid are now required for the proper installation of 
    anovaGO, even though these packages are only required for a subset
    of non-essential features in anovaGO. This way, all features in 
    anovaGO will be available as soon as the package is installed.
    
    o Messages printed to the user during the executing of the funcions
    were made as clear as possible, particularly when fetching information
    from the Ensembl BioMart server.


CHANGES IN VERSION 0.2
----------------------

UPDATED FEATURES
    
    o subset_scores() can now filter and keep only GO terms of a given
    type, i.e. "Biological process", "Molecular function", or "Cellular
    Component".


CHANGES IN VERSION 0.1
----------------------

OVERVIEW
    
    o This package was designed for the analysis of bioinformatics-related
    data based on gene expression measurements. It requires 3 input
    values: (1) a gene-by-sample table providing the expression level
    of genes (rows) in each sample (columns), (2) an AnnotatedDataFrame
    from the Biobase package providing phenotypic information about the
    samples grouping them by levels of the factor, (3) the name of the
    grouping factor to investigate, which must be a valid column name
    in the AnnotatedDataFrame.
    
    o The analysis will identify all Gene Ontology (GO) terms represented
    by at least one gene in the expression dataset. A one-way ANOVA
    will be performed on the grouping factor for each gene present in
    the expression dataset. Following multiple-testing correction, genes
    below the threshold for significance will be assigned an F.value of
    0. GO terms will be scored and ranked on the average F.value of
    associated genes.
    
    o Functions are provided to investigate and visualise the results of
    the above analysis. The score table can be filtered for rows over
    given thresholds. The distribution of scores can be visualised. The
    quantiles of scores can be obtained. The genes associated with a
    given GO term can be listed, with or without descriptive information.
    Hierarchical clustering of the samples can be performed based on the
    expression levels of genes associated with a given GO term. Heatmaps
    accompanied by hierarchical clustering of samples and genes can be
    drawn and customised.
    
    
FEATURES
    
    o GO_anova() scores all Gene Ontology (GO) terms represented in
    the dataset based on the ability of their associated genes to cluster
    samples according to a predefined grouping factor. It also returns
    the table used to map genes to GO terms, the table summarising the
    one-way ANOVA results for each gene, and finally the predefined
    grouping factor used for ANOVA. Genes annotated to a GO term but
    absent from the expression dataset are ignored.
    
    o get_mart_dataset() returns a connection to the appropriate BioMart
    dataset of the Ensembl server based on the gene name of the first gene in
    the expression dataset. The choice of the dataset can be overriden by the
    user if a valid BioMart Ensembl dataset is specified.
    
    o subset_scores() filters the table of scored GO terms in the
    output of GO_anova() and returns a list formatted identically to the 
    output of GO_anova() with the resulting filtered table of scores.
    
    o hist_scores() plots the distribution of average F scores in the
    output of GO_anova() or subset_scores().
    
    o quantiles_scores() returns the quantile values corresponding
    to defined percentiles.
    
    o list_genes() returns the list of Ensembl gene identifiers
    associated with a given GO term.
    
    o table_genes() returns a table of information about the Ensembl
    gene identifiers associated with  a given GO term.
    
    o cluster_GO() plots a hierarchical clustering of the samples
    based on the expression levels of genes associated with a given
    GO term.
    
    o heatmap_GO() plots a heatmap with hierarchical clustering of
    the samples and genes based on the expression levels of genes
    associated with a given GO term.
