\name{ALL}
\alias{ALL}
\docType{data}
\title{Acute Lymphoblastic Leukemia Data from the Ritz Laboratory}
\description{
 The data consist of microarrays from 128 different individuals with 
 acute lymphoblastic leukemia (ALL). A number of additional covariates
 are available. The data have been normalized (using gcrma) and it is
 the jointly normalized data that are available here. The data are
 presented in the form of an \code{exprSet} object.
}
\usage{data(ALL)}
\format{
 The different covariates are:
\itemize{
 \item{\code{cod}}{The patient IDs.}
  \item{\code{diagnosis}}{The date of diagnosis.}
  \item{\code{sex}}{The sex of the patient, coded as \code{M} and \code{F}.}
  \item{\code{age}}{The age of the patient in years.}
  \item{\code{BT}}{The type and stage of the disease; \code{B} indicates 
                  B-cell ALL while a \code{T} indicates T-cell ALL.}
  \item{\code{remission}}{A factor with two levels, either \code{CR} indicate 
   that remission was achieved or \code{REF} indicating that the patient was
   refractory and remission was not achieved.}
  \item{\code{CR}}{As above, but with extra information on whether the patient
    died while remission was being attempted.}
  \item{\code{date.cr}}{The date on which remission was achieved.}
  \item{\code{t(4;11)}}{A logical vector indicating whether a t(4;11) 
      translocation was detected.}
  \item{\code{t(9;22)}}{A logical vector indicating whether a t(9;22) 
    translocation was detected.}
  \item{\code{cyto.normal}}{A logical vector indicating whether the 
    cytogenetics were normal.}
  \item{\code{citog}}{A vector indicating the various cytogenetic abnormalities
    that were detected.}
  \item{\code{mol.biol}}{The assigned molecular biology of the cancer (mainly 
    for those with B-cell ALL), BCR/ABL, ALL/AF4, E2APBX etc.}
  \item{\code{fusion protein}}{For those with BCR/ABL which of the fusion 
     proteins was detected,  \code{p190}, \code{p190/p210}, \code{p210}.}
  \item{\code{mdr}}{The patients response to multidrug resistance, either
   \code{NEG}, or \code{POS}.}
  \item{\code{kinet}}{Not sure what this is.}
  \item{\code{ccr}}{What does it really mean}
 \item{\code{relapse}}{Definition}
 \item{\code{transplant}}{Did the patient receive a bone marrow transplant or
    not.}
 \item{\code{f.u}}{List the possible values and what they mean "BMT / DEATH IN CR" "rel" "rel" "rel " ...}
 \item{ date last seen}{ Date the patient was last seen.}
}
}
\details{
 Put some detail here
}
\source{
 Put a reference here
}
\examples{
data(ALL)
}
\keyword{datasets}

\eof
\name{GOHyperG}
\alias{GOHyperG}
\title{ Hypergeometric Tests for GO  }
\description{
 Given a set of unique LocusLink Identifiers, a microarray chip and the
 GO category of interest this function will compute all Hypergeomtric
 p-values for overrepresentation of the interesting genes (as indicated
 by the unique LocusLink Identifiers) at the nodes in the induced GO
 graph.
}
\usage{
GOHyperG(x, lib="hgu95av2", what="MF")
}
\arguments{
  \item{x}{A vector of unique LocusLink identifiers. }
  \item{lib}{The name of the annotation library for the chip that was used. }
  \item{what}{One of "MF", "BP", or "CC" indicating which of the GO
    categories the computations should be made for.}
}
\details{
  Typical usage will be to have a microarray experiment from which a set
  of interesting genes/probes has been obtained. To determine whether
  there is an overrepresentation of these genes at particular GO terms
  a simple hypergeometric calculation has often been made. Two
  substantial issues arise. First and most importantly it is not clear
  how to do any form of p-value correction in this case. The tests are
  not independent and the underlying structure of the GO graph present
  certain problems that still need to be addressed. The second substantial
  issue is that arises is that the mappings are based on LocusLink
  identifiers and hence all computations should also be based on
  unique LocusLink identifiers. In \code{GOHyperG} every attempt to
  appropriately correct for non-uniqueness of mappings has been made.
  
  The user provides a vector of unique LocusLink identifiers and these
  are used, together with the name of the chip to create the necessary
  counts. It is important that the correct chip be identified as that
  determines the overall counts and all inference will be incorrect if
  that is not correct.

  The test performed is a Hypergeometric test, using \code{phyper},
  where at each GO node we determine how many LLIDs from the chip were
  annotated there, how many of the supplied LLIDs were annotated there
  and compute a $p$-value. This is the equivalent of using Fisher's
  exact test.
  
}
\value{
  The returned value is a list with components:
  \item{pvalues }{The ordered p-values.}
  \item{goCounts}{The vector of counts of LLIDs from the chip at each node.}
  \item{intCounts}{The vector of counts of the supplied LLIDs annotated
    at each node.}
  \item{numLL}{The number of unique LLIDs on the chip that are mapped to
    some term in the specified GO category.}
  \item{numInt}{The number of unique LLIDs from those supplied that are
    mapped to some term in the specified GO category.}
}
\author{R. Gentleman}

\seealso{\code{\link{phyper}}}

\examples{

library(hgu95av2)
library(GO)
w1<-as.list(hgu95av2LOCUSID)
w2<-unique(unlist(w1))
set.seed(123)
#pick a hundred interesting genes
 myLL <- sample(w2, 100)
 xx<-GOHyperG(myLL)
xx$numLL
xx$numInt
sum(xx$pvalues < 0.01)

}
\keyword{ htest }

\eof
\name{GOLeaves}
\alias{GOLeaves}
\title{Identify the leaves in a GO Graph  }
\description{
  Given a GO graph this function returns the node labels for all leaves
  in the graph. A leaf is defined to be a node with only out-edges and
  no in-edges.
}
\usage{
GOLeaves(inG)
}
\arguments{
  \item{inG}{An instance of a GO graph. }
}
\details{
  All nodes in \code{inG} are inspected for in-edges and those with none
  are returned.

  This should probably be replaced by a function in the graph package
  that identifies leaves, there is nothing special about GO here.
  
}
\value{
  A vector of the node labels for those nodes with no in-edges.
}

\author{R. Gentleman}

\seealso{\code{\link{makeGOGraph}}}

\examples{
 g1 <- oneGOGraph("GO:0003680", GOMFPARENTS)
 g2 <- oneGOGraph("GO:0003701", GOMFPARENTS)
 g3 <- combGOGraph(g1, g2)
 GOLeaves(g3)
}
\keyword{manip}

\eof
\name{Ndists}
\alias{Ndists}
\alias{Bdists}
\docType{data}
\title{Distance matrices for the BCR/ABL and NEG subgroups.}
\description{
  These are precomputed distance matrices between all transcription
  factors selected. In the future they will be computed on the fly but
  currently that takes about 3 hours and so precomputed versions are
  supplied. 
}
\usage{data(Ndists)
  data(Bdists)}
\format{
  These are both distance matrices.
}
\source{
 They are based on the ALL data, \code{\link{ALL}}.
}
\examples{
data(Ndists)
data(Bdists)
}
\keyword{datasets}

\eof
\name{combGOGraph}
\alias{combGOGraph}
\title{Combine GO Graphs  }
\description{
  Given two GO graphs this function combines them into a single graph
  and returns that.
}
\usage{
combGOGraph(g1, g2)
}
\arguments{
  \item{g1}{An instance of a \code{graph}.}
  \item{g2}{An instance of a \code{graph}. }
}
\details{
  A new \code{graph} object is created by linking the two supplied
  graphs together. Since they are both GO graphs they will share, at
  least, the GO root node.

  This could probably be replaced by the \code{union} function in the
  \code{graph} package.
}
\value{
  A graph containing the union of the nodes and edges in the two
  graphs.
}
\author{R. Gentleman}

\seealso{\code{\link{union}}}

\examples{
 g1 <- oneGOGraph("GO:0003680", GOMFPARENTS)
 g2 <- oneGOGraph("GO:0003701", GOMFPARENTS)
 g3 <- combGOGraph(g1, g2)
}
\keyword{manip}

\eof
\name{compCorrGraph}
\alias{compCorrGraph}
\title{A function to compute a correlation based graph from Gene
Expression Data }
\description{
  Given a set of gene expression data (an instance of the
  \code{exprSet} class) this function computes a graph based on
  correlations between the probes.
}
\usage{
compCorrGraph(eSet, k = 1, tau = 0.6)
}
\arguments{
  \item{eSet}{An instance of the \code{exprSet} class. }
  \item{k}{The power to raise the correlations to. }
  \item{tau}{The lower cutoff for absolute correlations. }
}
\details{
  Zhou et al. describe a method of computing a graph between probes
  (genes) based on estimated correlations between probes. This function
  implements some of their methods.

  Pearson correlations between probes are computed and then these are
  raised to the power \code{k}. Any of the resulting estimates that are
  less than \code{tau} in absolute value are set to zero. 
}
\value{
  An instance of the \code{graph} class. With edges and edge weights
  determined by applying the algorithm described previously.
}
\references{Zhou et al., Transitive functional annotation by
shortest-path analysis of gene expression data.}
\author{R. Gentleman}

\seealso{\code{\link{compGdist}}}
\examples{

 data(ALL)
 set.seed(123)
 gs = sample(1:dim(ALL@exprs)[1], 200)
 Tsub = ALL[gs, grep("^T", as.character(ALL$BT))]

 corrG = compCorrGraph(Tsub)

}
\keyword{ manip }


\eof
\name{compGdist}
\alias{compGdist}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{A function to compute the distance between pairs of nodes in a graph. }
\description{
  Given a graph, \code{g}, and a set of nodes in the graph,
  \code{whNodes}, Dijkstra's shortest path algorithm is used to compute
  the distance between all pairs of nodes in \code{whNodes}.
}
\usage{
compGdist(g, whNodes, verbose = FALSE)
}
\arguments{
  \item{g}{ An instance of the \code{graph} class. }
  \item{whNodes}{A vector of lables of the nodes in \code{g} for which
  distances are to be computed. }
  \item{verbose}{If \code{TRUE} then output reporting the progress will
  be reported. }
}
\details{
  This function can be quite slow, computation of the pairwise
  distances is not especially fast and if \code{whNodes} is long then
  there are many of them to compute.
}
\value{
  A matrix containing the pairwise distances. It might be worth making
  this an instance of the \code{dist} class at some point.
}

\author{R. Gentleman }

\seealso{ \code{\link{compCorrGraph}} }
\examples{

 example(compCorrGraph)
 compGdist(corrG, nodes(corrG)[1:5])

}
\keyword{manip }


\eof
\name{distCGO}
\alias{distCGO}
\alias{distDGGO}
\title{ Distances between GO graphs }
\description{
  These functions provide different ways of measuring the distance
  between two GO graphs. See \code{oneGOGraph} for a description of how
  these graphs are constructed.
}
\usage{
distCGO(term1, term2, dataenv)
distDGGO(term1, term2, dataenv)
}
\arguments{
  \item{term1}{The term used to construct the first graph. }
  \item{term2}{The term used to construct the second graph. }
  \item{dataenv}{The data used to construct both graphs. }
}
\details{
  A number of distances between induced GO graphs have been considered
  in the literature. A basic idea behind these distances is to convey
  some information about the relatedness of the terms (or possibly of
  the genes annotated at those terms) using these distance measures.

  For \code{distDGGO} the distance is the number of nodes in the
  intersection of the two induced graphs divided by the number of nodes
  in the union of the two graphs.

  For \code{distCGO} the distance is not yet implemented.
}
\value{
  A numeric value indicating the distance between \code{term1} and
  \code{term2}. 
}
\references{(1) Cheng et al, Affymetrix. (2) B. Ding and R. Gentleman }
\author{R. Gentleman}

\seealso{\code{\link{oneGOGraph}}, \code{\link{makeGOGraph}}}

\examples{
 distDGGO("GO:0005488", "GO:0030528", GOMFPARENTS)

 distDGGO("GO:0003700", "GO:0030528", GOMFPARENTS)

}
\keyword{manip}

\eof
\name{getGOTerm}
\alias{getGOTerm}
\alias{getGOParents}
\alias{getGOChildren}
\alias{getGOCategory}

\title{Functions to Access GO data.  }
\description{
  These functions provide access to data in the GO package. The data are
  assembled from publically available data from the Gene Ontology
  Consortium (GO), \url{www.go.org}. Given a list of GO identifiers they
  access the children (more specific terms), the parents (less specific
  terms) and the terms themselves.
}
\usage{
getGOTerm(x)
getGOParents(x)
getGOChildren(x)
getGOCategory(x)
}
\arguments{
  \item{x}{A character vector of valid GO identifiers. }
}
\details{
  GO consists of three (soon to be more) specific hierarchies: Molecular
  Function (MF), Biological Process (BP) and Cellular Component
  (CC). For more details consult the GO website. For each GO identifier
  each of these three hierarchies is searched and depending on the
  function called the appropriate values are obtained and returned.

  It is possible for a GO identifier to have no children or for it to
  have no parents. However, it must have a term associated with it.
}
\value{
  A list of the same length as \code{x}.
  The list contains one entry for each element of \code{x}. That entry
  is itself a list. With one component named \code{Ontology} which
  has as its value one of MF, BP or CC. The second component has the
  same name as the suffix of the call, i.e. Children, Parents, or Term.
  If there was no match in any of the ontologies then a length zero list
  is returned for that element of \code{x}.

  For \code{getGOCategory} a vector of categories (the names of which
  are the original GO term names). Elements of this list that are
  \code{NA} indicate term names for which there is no category (and
  hence they are not really term names).
}
\references{The Gene Ontology Consortium}
\author{R. Gentleman}


\examples{
 library(GO)

 sG <- c("GO:0005515", "GO:0000123", "GO:0000124", "GO:0000125",
             "GO:0000126", "GO:0020033", "GO:0006830", "GO:0009874",
             "GO:0015916", "GO:0015339")


 gT <- getGOTerm(sG)

 gP <- getGOParents(sG)

 gC <- getGOChildren(sG)

 gcat <- getGOCategory(sG)

}
\keyword{manip}

\eof
\name{hasGOannote}
\alias{hasGOannote}
\title{Check for GO annotation  }
\description{
  Given a GO term and an environment this function determines whether
  there is an entry (or symbol) in the environment with the same name as
  the term.
}
\usage{
hasGOannote(x, which="MF")
}
\arguments{
  \item{x}{A length one character vector. }
  \item{which}{Either an environment or the name of an environment. By
    default this is the MF parents environment. }
}
\details{
  The element \code{x} is searched for in the environment. If it is
  found then \code{TRUE} is returned otherwise \code{FALSE} is returned.
}
\value{
  As described above.
}
\author{R. Gentleman}
\seealso{\code{\link{get}}}

\examples{
 t1 <- "GO:0003680"
 hasGOannote(t1)
 hasGOannote(t1, "BP")
}
\keyword{manip}

\eof
\name{idx2dimnames}
\alias{idx2dimnames}
\title{Index to Dimnames}
\description{
  A function to map from integer offsets in an array to the
corresponding values of the row and column names. There is probably a
better way but I didn't find it.
}
\usage{
idx2dimnames(x, idx)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{x}{a \code{matrix} or \code{data.frame}. }
  \item{idx}{An integer vector of offsets into the matrix (values
  between 1 and the \code{length} of the matrix.}
}

\value{
 A list with two components.
  If it is a LIST, use
  \item{rowNames }{The row names corresponding to the integer index.}
  \item{colNames }{The column names corresponding to the integer index.}
}

\author{R. Gentleman}

\seealso{\code{\link{dimnames}} }
\examples{
 data(Ndists)
 ltInf = is.finite(Ndists)
 xx = idx2dimnames(Ndists, ltInf)
}
\keyword{manip}

\eof
\name{makeGOGraph}
\alias{makeGOGraph}
\title{ Construct a GO Graph }
\description{
  The directed acyclic graph (DAG) based on finding the most specific
  terms for the supplied identifiers is constructed and returned. The
  constructuion is per GO ontology (there are three, MF, BP and CC) and
  once the most specific terms have been identified then all less
  specific terms are found (these are the parents of the terms) and then
  their parents and so on, until the root is encountered.  
}
\usage{
makeGOGraph(x, what="MF", lib="hgu95av2", removeRoot = TRUE)
}
\arguments{
  \item{x}{A vector of identifiers. }
  \item{what}{Which of the GO ontologies to use. }
  \item{lib}{The name of a meta-data package to use for mapping the
    identifiers to GO identifiers. }
  \item{removeRoot}{A logical value indicating whether the GO root nood
  should be removed or not.}
}
\details{
  The mapping of manufacturers identifiers (e.g. Affymetrix) to GO
  identifiers is done on the basis of an initial mapping to LocusLink
  identifiers. For many data sources these are available from the
  Bioconductor Project (\url{www.bioconductor.org}).

  Once that mapping has occurred we obtain from the meta-data package
  the mapping to the most specific terms (again these have been
  precomputed) and using those together with the GO package (again from
  Bioconductor) mapping to all parents down to the root node is
  performed.

  The mappings are different for the different ontologies. Typically a
  GO indentifier is used only in one specific ontology.

  The resulting structure is stored in a graph using the \code{graph}
  package, again from Bioconductor.
  
}
\value{
  An object that inherits from the \code{graph} class. The particular
  implementation is not specified.
}
\references{The Gene Ontology Consortium }
\author{R. Gentleman}

\seealso{\code{\link{oneGOGraph}}}

\examples{
 gN <- c("38940_at","2073_s_at", "35580_at",  "34701_at")
 gg1 <- makeGOGraph(gN, "BP", "hgu95av2")
 
}
\keyword{manip}

\eof
\name{notConn}
\alias{notConn}
\title{Find genes that are not connected to the others. }
\description{
  A function that takes as input a distance matrix and finds those
  entries that are not connected to any others (ie. those with distance
  \code{Inf}. 
}
\usage{
notConn(dists)
}
\arguments{
  \item{dists}{A distance matrix.}
}
\details{
  It is a very naive implementation. It presumes that not connected
  entries are not connected to any other entries, and this might not be
  true. Using the \code{connComp} function from the \code{graph}
  package or the \code{RBGL} package might be a better approach.
}
\value{
 A vector of the names of the items that are not connected.
}
\author{R. Gentleman }
\seealso{\code{\link{connComp}}}
\examples{
 data(Ndists)
 notConn(Ndists)
}
\keyword{manip}

\eof
\name{oneGOGraph}
\alias{oneGOGraph}
\title{Build a GO graph for one identifier  }
\description{
  Given a single GO identifier and a set of mappings
 to the less specific sets of nodes this function will construct 
 the graph that includes that node and all children down to the root node
 for the ontology.
}
\usage{
oneGOGraph(x, dataenv)
}
\arguments{
  \item{x}{A length one character vector with the name of the term. }
  \item{dataenv}{ An environment for finding the parents of that term. }
}
\details{
  For any gene we define the induced GO graph to be that graph, based on
  the DAG structure (child - parent) of the GO ontology of terms.
}
\value{
  The induced GO graph (or NULL) for the given GO identifier.
}
\author{R. Gentleman}

\seealso{\code{\link{makeGOGraph}}}

\examples{

 g1 <- oneGOGraph("GO:0003680", GOMFPARENTS)
 g2 <- oneGOGraph("GO:0003701", GOMFPARENTS)
 g3 <- combGOGraph(g1, g2)

if( require(Rgraphviz) && interactive() )
  plot(g3)
}
\keyword{manip}

\eof
\name{shortestPath}
\alias{shortestPath}
\title{ Shortest Path Analysis }
\description{
 The shortest path analysis was proposed by Zhou et. al. The basic
 computation is to find the shortest path in a supplied graph between
 two LocusLink IDs. Zhou et al claim that other genes annotated along
 that path are likely to have the same GO annotation as the two end
 points.
}
\usage{
shortestPath(g, GOnode)
}
\arguments{
  \item{g}{An instance of the \code{graph} class. }
  \item{GOnode}{A length one character vector specifying the GO node of
    interest. }
}
\details{
  The algorithm implemented here is quite simple. All LocusLink
  identifiers that are annotated at the GO node of interest are
  obtained. Those that are found as nodes in the graph are retained and
  used for the computation. For every pair of nodes at the GO term the
  shortest path between them is computed using \code{sp.between} from
  the RBGL package.

  There is a presumption that the graph is \code{undirected}. This
  restriction could probably be lifted if there was some reason for it -
  a patch would be gratefully accepted.
  
}
\value{
  The return values is a list with the following components:
  \item{shortestpaths }{A list of the ouput from \code{sp.between}. The
    names are the names of the nodes used as the two endpoints}
  \item{nodesUsed }{A vector of the LocusLink IDs that were both found
    at the GO term of interest and were nodes in the supplied graph,
    \code{g}. These were used to compute the shortest paths.}
  \item{nodesNotUsed}{A vector of LocusLink IDs that were annotated at
    the GO term, but were not found in the graph \code{g}.}
}
\references{Transitive functional annotation by shortest-path analysis
  of gene expression data, by X. Zhou and M-C J. Kao and W. H. Wong,
  PNAS, 2002}

\author{R. Gentleman }

\seealso{\code{\link{sp.between}}}

\examples{
library(GO)
library(RBGL)

tst <- unique(unlist(mget(c("GO:0005778", "GO:0005779",
                            "GO:0030060"), GOGO2LL)))
set.seed(123)
v1 <- randomGraph(tst, 1:10, .3)


a1 <- shortestPath(v1, "GO:0005779")


}
\keyword{ manip }

\eof
\name{triadCensus}
\alias{triadCensus}
\alias{triad}
\alias{enumPairs}
\alias{isTriad}
\alias{reduce2Degreek}
\title{ Triad Functions }
\description{
  These functions provide some tools for finding triads in an undirected
  graph. A triad is a clique of size 3. The function \code{triadCensus}
  returns a list of all triads.
}
\usage{
triadCensus(graph)
isTriad(x, y, z, elz, ely)
reduce2Degreek(graph, k)
enumPairs(iVec)

}
\arguments{
  \item{graph}{An instance of the \code{graph} class. }
  \item{k}{An integer indicating the minimum degree wanted.}
  \item{x}{A node}
  \item{y}{A node}
  \item{z}{A node}
  \item{elz}{The edgelist for \code{z}}
  \item{ely}{The edgelist for \code{y}}
  \item{iVec}{A vector of unique values}
}
\details{
  \code{enumPairs} takes a vector as input and returns a list of length
  \code{choose(length(iVec),2)/2} containing all unordered pairs of
  elements.

  \code{isTriad} takes three nodes as arguments. It is already known
  that \code{x} has edges to both \code{y} and \code{z} and we want to
  determine whether these are reciprocated. This is determined by
  examining \code{elz} for both \code{x} and \code{y} and then examining
  \code{ely} for both \code{x} and \code{z}. 

  \code{reduce2Degreek} is a function that takes an undirected graph as
  input and removes all nodes of degree less than \code{k}. This process
  is iterated until there are no nodes left (an error is thrown) or all
  nodes remaining have degree at least \code{k}. The resultant subgraph
  is returned. It is used here because to be in a triad all nodes must
  have degree 2 or more.

  \code{triadCensus} makes use of the helper functions described above
  and finds all triads in the graph.
  
}
\value{
  A list where each element is a triple indicating the members of the
  triad. Order is not important and all triads are reported in
  alphabetic order.
}

\author{R. Gentleman}
\note{See the graph package, RBGL and Rgraphviz for more details and
  alternatives. }


\examples{
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
}
\keyword{manip}

\eof
