Class UnconditionedExactTest
- java.lang.Object
-
- org.apache.commons.statistics.inference.UnconditionedExactTest
-
public final class UnconditionedExactTest extends java.lang.ObjectImplements an unconditioned exact test for a contingency table.Performs an exact test for the statistical significance of the association (contingency) between two kinds of categorical classification. A 2x2 contingency table is:
\[ \left[ {\begin{array}{cc} a & b \\ c & d \\ \end{array} } \right] \]
This test applies to the case of a 2x2 contingency table with one margin fixed. Note that if both margins are fixed (the row sums and column sums are not random) then Fisher's exact test can be applied.
This implementation fixes the column sums \( m = a + c \) and \( n = b + d \). All possible tables can be created using \( 0 \le a \le m \) and \( 0 \le b \le n \). The random values \( a \) and \( b \) follow a binomial distribution with probabilities \( p_0 \) and \( p_1 \) such that \( a \sim B(m, p_0) \) and \( b \sim B(n, p_1) \). The p-value of the 2x2 table is the product of two binomials:
\[ \begin{aligned} p &= Pr(a; m, p_0) \times Pr(b; n, p_1) \\ &= \binom{m}{a} p_0^a (1-p_0)^{m-a} \times \binom{n}{b} p_1^b (1-p_1)^{n-b} \end{aligned} \]
For the binomial model, the null hypothesis is the two nuisance parameters are equal \( p_0 = p_1 = \pi\), with \( \pi \) the probability for equal proportions, and the probability of any single table is:
\[ p = \binom{m}{a} \binom{n}{b} \pi^{a+b} (1-\pi)^{m+n-a-b} \]
The p-value of the observed table is calculated by maximising the sum of the as or more extreme tables over the domain of the nuisance parameter \( 0 \lt \pi \lt 1 \):
\[ p(a, b) = \sum_{i,j} \binom{m}{i} \binom{n}{j} \pi^{i+j} (1-\pi)^{m+n-i-j} \]
where table \( (i,j) \) is as or more extreme than the observed table \( (a, b) \). The test can be configured to select more extreme tables using various methods.
Note that the sum of the joint binomial distribution is a univariate function for the nuisance parameter \( \pi \). This function may have many local maxima and the search enumerates the range with a configured number of points. The best candidates are optionally used as the start point for an optimized search for a local maxima.
References:
- Barnard, G.A. (1947). Significance tests for 2x2 tables. Biometrika, 34, Issue 1-2, 123–138.
- Boschloo, R.D. (1970). Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica neerlandica, 24(1), 1–9.
- Suisaa, A and Shuster, J.J. (1985). Exact Unconditional Sample Sizes for the 2 × 2 Binomial Trial. Journal of the Royal Statistical Society. Series A (General), 148(4), 317-327.
- Since:
- 1.1
- See Also:
FisherExactTest, Boschloo's test (Wikipedia), Barnard's test (Wikipedia)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static interfaceUnconditionedExactTest.BoschlooStatisticCompute the statistic for Boschloo's test.(package private) static classUnconditionedExactTest.CandidatesA container of (key,value) pairs to store candidate minima.static classUnconditionedExactTest.MethodDefine the method to determine the more extreme tables.static classUnconditionedExactTest.ResultResult for the unconditioned exact test.private static classUnconditionedExactTest.XYListAn expandable list of (x,y) values.
-
Field Summary
Fields Modifier and Type Field Description private AlternativeHypothesisalternativeAlternative hypothesis.private static java.lang.StringCOLUMN_SUMError message text for zero column sums.private static UnconditionedExactTestDEFAULTDefault instance.private static doubleINC_FRACTIONFraction of the increment (interval between enumerated points) to initialise the bracket for the minima.private static doubleLOWER_BOUNDLower bound for the enumerated interval.private static intMAX_CANDIDATESMaximum number of candidate to optimize.private static intMAX_TABLESThe maximum number of tables.private UnconditionedExactTest.MethodmethodMethod to identify more extreme tables.private static doubleMINIMA_EPSRelative distance of candidate minima from the lowest candidate.private booleanoptimizeOption to optimize the best initial point(s).private intpointsNumber of initial points.private static doubleSOLVER_RELATIVE_EPSRelative epsilon for the Brent solver.
-
Constructor Summary
Constructors Modifier Constructor Description privateUnconditionedExactTest(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidaddCandidate(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2)Add point 2 to the list of minima if neither neighbour value is lower.private static voidcheckTable(int[][] table)Check the input is a 2-by-2 contingency table.private double[]computePValue(UnconditionedExactTest.XYList tableList)Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.private static java.util.function.DoubleUnaryOperatorcreateBinomialModel(UnconditionedExactTest.XYList tableList)Creates the binomial model p-value function for the nuisance parameter.private static java.util.function.IntToDoubleFunctioncreateLogBinomialCoefficients(int n)Create the natural logarithm of the binomial coefficient for allk = [0, n].private doublefindExtremeTables(int a, int b, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table.private doublefindExtremeTablesBoschloo(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).private doublefindExtremeTablesZ(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table using the Z statistic.doublestatistic(int[][] table)Compute the statistic for the unconditioned exact test.private doublestatisticBoschloo(int a, int b, int m, int n)Compute the statistic using Fisher's p-value (also known as Boschloo's test).private static doublestatisticBoschlooTwoSided(Hypergeom distribution, int k)Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).private static doublestatisticZ(int a, int b, int m, int n, boolean pooled)Compute the statistic from a Z-test.UnconditionedExactTest.Resulttest(int[][] table)Performs an unconditioned exact test on the 2-by-2 contingency table.UnconditionedExactTestwith(AlternativeHypothesis v)Return an instance with the configured alternative hypothesis.UnconditionedExactTestwith(UnconditionedExactTest.Method v)Return an instance with the configured method.static UnconditionedExactTestwithDefaults()Return an instance using the default options.UnconditionedExactTestwithInitialPoints(int v)Return an instance with the configured number of initial points.UnconditionedExactTestwithOptimize(boolean v)Return an instance with the configured optimization of initial search points.
-
-
-
Field Detail
-
DEFAULT
private static final UnconditionedExactTest DEFAULT
Default instance.SciPy's boschloo_exact and barnard_exact tests use 32 points in the interval [0, 1) The R Exact package uses 100 in the interval [1e-5, 1-1e-5]. Barnards 1947 paper describes the nuisance parameter in the open interval
0 < pi < 1. Here we respect the open-interval for the initial candidates and ignore 0 and 1. The initial bounds used are the same as the R Exact package. We closely match the inner 31 points from SciPy by using 33 points by default.
-
LOWER_BOUND
private static final double LOWER_BOUND
Lower bound for the enumerated interval. The upper bound is1 - lower.- See Also:
- Constant Field Values
-
SOLVER_RELATIVE_EPS
private static final double SOLVER_RELATIVE_EPS
Relative epsilon for the Brent solver. This is limited for a univariate function to approximately sqrt(eps) with eps = 2^-52.- See Also:
- Constant Field Values
-
INC_FRACTION
private static final double INC_FRACTION
Fraction of the increment (interval between enumerated points) to initialise the bracket for the minima. Note the minima should lie between x +/- increment. The bracket should search within this range. Set to 1/8 and so the initial point of the bracket is approximately 1.61 * 1/8 = 0.2 of the increment away from initial points a or b.- See Also:
- Constant Field Values
-
MAX_CANDIDATES
private static final int MAX_CANDIDATES
Maximum number of candidate to optimize. This is a safety limit to avoid excess optimization. Only candidates within a relative tolerance of the best candidate are stored. If the number of candidates exceeds this value then many candidates have a very similar p-value and the top candidates will be optimized. Using a value of 3 allows at least one other candidate to be optimized when there is two-fold symmetry in the energy function.- See Also:
- Constant Field Values
-
MINIMA_EPS
private static final double MINIMA_EPS
Relative distance of candidate minima from the lowest candidate. Used to exclude poor candidates from optimization.- See Also:
- Constant Field Values
-
MAX_TABLES
private static final int MAX_TABLES
The maximum number of tables. This is limited by the maximum number of indices that can be maintained in memory. Potentially up to this number of tables must be tracked during computation of the p-value for as or more extreme tables. The limit is set using the same limit for maximum capacity as java.util.ArrayList. In practice any table anywhere near this limit can be computed using an alternative such as a chi-squared or g test.- See Also:
- Constant Field Values
-
COLUMN_SUM
private static final java.lang.String COLUMN_SUM
Error message text for zero column sums.- See Also:
- Constant Field Values
-
alternative
private final AlternativeHypothesis alternative
Alternative hypothesis.
-
method
private final UnconditionedExactTest.Method method
Method to identify more extreme tables.
-
points
private final int points
Number of initial points.
-
optimize
private final boolean optimize
Option to optimize the best initial point(s).
-
-
Constructor Detail
-
UnconditionedExactTest
private UnconditionedExactTest(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize)
- Parameters:
alternative- Alternative hypothesis.method- Method to identify more extreme tables.points- Number of initial points.optimize- Option to optimize the best initial point(s).
-
-
Method Detail
-
withDefaults
public static UnconditionedExactTest withDefaults()
Return an instance using the default options.- Returns:
- default instance
-
with
public UnconditionedExactTest with(AlternativeHypothesis v)
Return an instance with the configured alternative hypothesis.- Parameters:
v- Value.- Returns:
- an instance
-
with
public UnconditionedExactTest with(UnconditionedExactTest.Method v)
Return an instance with the configured method.- Parameters:
v- Value.- Returns:
- an instance
-
withInitialPoints
public UnconditionedExactTest withInitialPoints(int v)
Return an instance with the configured number of initial points.The search for the nuisance parameter will use \( v \) points in the open interval \( (0, 1) \). The interval is evaluated by including start and end points approximately equal to 0 and 1. Additional internal points are enumerated using increments of approximately \( \frac{1}{v-1} \). The minimum number of points is 2. Increasing the number of points increases the precision of the search at the cost of performance.
To approximately double the number of points so that all existing points are included and additional points half-way between them are sampled requires using
2p - 1wherepis the existing number of points.- Parameters:
v- Value.- Returns:
- an instance
- Throws:
java.lang.IllegalArgumentException- if the value is< 2.
-
withOptimize
public UnconditionedExactTest withOptimize(boolean v)
Return an instance with the configured optimization of initial search points.If enabled then the initial point(s) with the highest probability is/are used as the start for an optimization to find a local maxima.
- Parameters:
v- Value.- Returns:
- an instance
- See Also:
withInitialPoints(int)
-
statistic
public double statistic(int[][] table)
Compute the statistic for the unconditioned exact test. The statistic returned depends on the configured method.- Parameters:
table- 2-by-2 contingency table.- Returns:
- test statistic
- Throws:
java.lang.IllegalArgumentException- if thetableis not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
with(Method),test(int[][])
-
test
public UnconditionedExactTest.Result test(int[][] table)
Performs an unconditioned exact test on the 2-by-2 contingency table. The statistic and p-value returned depends on the configured method and alternative hypothesis.The search for the nuisance parameter that maximises the p-value can be configured to: start with a number of initial points; and optimize the best points.
- Parameters:
table- 2-by-2 contingency table.- Returns:
- test result
- Throws:
java.lang.IllegalArgumentException- if thetableis not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
with(Method),with(AlternativeHypothesis),statistic(int[][])
-
findExtremeTables
private double findExtremeTables(int a, int b, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table.If the list of tables is full then all tables are more extreme. Some configurations can detect this without performing a search and in this case the list of tables is returned as empty.
- Parameters:
a- Observed value for a.b- Observed value for b.tableList- List to track more extreme tables.- Returns:
- the test statistic
-
statisticZ
private static double statisticZ(int a, int b, int m, int n, boolean pooled)Compute the statistic from a Z-test.- Parameters:
a- Observed value for a.b- Observed value for b.m- Column sum m.n- Column sum n.pooled- true to use a pooled variance.- Returns:
- z
-
findExtremeTablesZ
private double findExtremeTablesZ(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table using the Z statistic.- Parameters:
a- Observed value for a.b- Observed value for b.m- Column sum m.n- Column sum n.pooled- true to use a pooled variance.tableList- List to track more extreme tables.- Returns:
- observed z
-
statisticBoschloo
private double statisticBoschloo(int a, int b, int m, int n)Compute the statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
a- Observed value for a.b- Observed value for b.m- Column sum m.n- Column sum n.- Returns:
- p-value
-
statisticBoschlooTwoSided
private static double statisticBoschlooTwoSided(Hypergeom distribution, int k)
Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
distribution- Hypergeometric distribution.k- Observed value.- Returns:
- p-value
-
findExtremeTablesBoschloo
private double findExtremeTablesBoschloo(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList)Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).- Parameters:
a- Observed value for a.b- Observed value for b.m- Column sum m.n- Column sum n.tableList- List to track more extreme tables.- Returns:
- observed p-value
-
computePValue
private double[] computePValue(UnconditionedExactTest.XYList tableList)
Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.The current method enumerates an initial set of points and stores local extrema as candidates. Any candidate within 2% of the best is optionally optimized; this is limited to the top 3 candidates. These settings could be exposed as configurable options. Currently only the choice to optimize or not is exposed.
- Parameters:
tableList- List of tables.- Returns:
- [nuisance parameter, p-value]
-
createBinomialModel
private static java.util.function.DoubleUnaryOperator createBinomialModel(UnconditionedExactTest.XYList tableList)
Creates the binomial model p-value function for the nuisance parameter. Note: This function computes the negative p-value so is suitable for optimization by a search for a minimum.- Parameters:
tableList- List of tables.- Returns:
- the function
-
createLogBinomialCoefficients
private static java.util.function.IntToDoubleFunction createLogBinomialCoefficients(int n)
Create the natural logarithm of the binomial coefficient for allk = [0, n].- Parameters:
n- Limit N.- Returns:
- ln binom(n, k)
-
addCandidate
private void addCandidate(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2)
Add point 2 to the list of minima if neither neighbour value is lower.!(v1 < v2 || v3 < v2)
- Parameters:
minima- Candidate minima.v1- First point function value.v2- Second point function value.v3- Third point function value.x2- Second point.
-
checkTable
private static void checkTable(int[][] table)
Check the input is a 2-by-2 contingency table.- Parameters:
table- Contingency table.- Throws:
java.lang.IllegalArgumentException- if thetableis not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.
-
-