Class GTest
This is known in statistical genetics as the McDonald-Kreitman test. The implementation handles both known and unknown distributions.
Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.
- Since:
- 1.1
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final GTestDefault instance.private final intDegrees of freedom adjustment. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static voidcheckNonZero(double value, String name, int index) Check the array value is non-zero.private static doublecomputeP(double g, double degreesOfFreedom) Compute the G-test p-value.doublestatistic(double[] expected, long[] observed) Computes the G-test goodness-of-fit statistic comparingobservedandexpectedfrequency counts.doublestatistic(long[] observed) Computes the G-test goodness-of-fit statistic comparing theobservedcounts to a uniform expected value (each category is equally likely).doublestatistic(long[][] counts) Computes a G-test statistic associated with a G-test of independence based on the inputcountsarray, viewed as a two-way table.test(double[] expected, long[] observed) Perform a G-test for goodness-of-fit evaluating the null hypothesis that theobservedcounts conform to theexpectedcounts.test(long[] observed) Perform a G-test for goodness-of-fit evaluating the null hypothesis that theobservedcounts conform to a uniform distribution (each category is equally likely).test(long[][] counts) Perform a G-test of independence based on the inputcountsarray, viewed as a two-way table.static GTestReturn an instance using the default options.Return an instance with the configured degrees of freedom adjustment.
-
Field Details
-
DEFAULT
Default instance. -
degreesOfFreedomAdjustment
private final int degreesOfFreedomAdjustmentDegrees of freedom adjustment.
-
-
Constructor Details
-
GTest
private GTest(int degreesOfFreedomAdjustment) - Parameters:
degreesOfFreedomAdjustment- Degrees of freedom adjustment.
-
-
Method Details
-
withDefaults
Return an instance using the default options.- Returns:
- default instance
-
withDegreesOfFreedomAdjustment
Return an instance with the configured degrees of freedom adjustment.The default degrees of freedom for a sample of length
naren - 1. An intrinsic null hypothesis is one where you estimate one or more parameters from the data in order to get the numbers for your null hypothesis. For a distribution withpparameters where up topparameters have been estimated from the data the degrees of freedom is in the range[n - 1 - p, n - 1].- Parameters:
v- Value.- Returns:
- an instance
- Throws:
IllegalArgumentException- if the value is negative
-
statistic
public double statistic(long[] observed) Computes the G-test goodness-of-fit statistic comparing theobservedcounts to a uniform expected value (each category is equally likely).Note: This is a specialized version of a comparison of
observedwith anexpectedarray of uniform values. The result is faster than callingstatistic(double[], long[])and the statistic is the same, with an allowance for accumulated floating-point error due to the optimized routine.- Parameters:
observed- Observed frequency counts.- Returns:
- G-test statistic
- Throws:
IllegalArgumentException- if the sample size is less than 2;observedhas negative entries; or all the observations are zero.- See Also:
-
statistic
public double statistic(double[] expected, long[] observed) Computes the G-test goodness-of-fit statistic comparingobservedandexpectedfrequency counts.Note:This implementation rescales the values if necessary to ensure that the sum of the expected and observed counts are equal.
- Parameters:
expected- Expected frequency counts.observed- Observed frequency counts.- Returns:
- G-test statistic
- Throws:
IllegalArgumentException- if the sample size is less than 2; the array sizes do not match;expectedhas entries that are not strictly positive;observedhas negative entries; or all the observations are zero.- See Also:
-
statistic
public double statistic(long[][] counts) Computes a G-test statistic associated with a G-test of independence based on the inputcountsarray, viewed as a two-way table. The formula used to compute the test statistic is:\[ G = 2 \cdot \sum_{ij}{O_{ij}} \cdot \left[ H(r) + H(c) - H(r,c) \right] \]
and \( H \) is the Shannon Entropy of the random variable formed by viewing the elements of the argument array as incidence counts:
\[ H(X) = - {\sum_{x \in \text{Supp}(X)} p(x) \ln p(x)} \]
- Parameters:
counts- 2-way table.- Returns:
- G-test statistic
- Throws:
IllegalArgumentException- if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.- See Also:
-
test
Perform a G-test for goodness-of-fit evaluating the null hypothesis that theobservedcounts conform to a uniform distribution (each category is equally likely).- Parameters:
observed- Observed frequency counts.- Returns:
- test result
- Throws:
IllegalArgumentException- if the sample size is less than 2;observedhas negative entries; or all the observations are zero- See Also:
-
test
Perform a G-test for goodness-of-fit evaluating the null hypothesis that theobservedcounts conform to theexpectedcounts.The test can be configured to apply an adjustment to the degrees of freedom if the observed data has been used to create the expected counts.
- Parameters:
expected- Expected frequency counts.observed- Observed frequency counts.- Returns:
- test result
- Throws:
IllegalArgumentException- if the sample size is less than 2; the array sizes do not match;expectedhas entries that are not strictly positive;observedhas negative entries; all the observations are zero; or the adjusted degrees of freedom are not strictly positive- See Also:
-
test
Perform a G-test of independence based on the inputcountsarray, viewed as a two-way table.- Parameters:
counts- 2-way table.- Returns:
- test result
- Throws:
IllegalArgumentException- if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.- See Also:
-
computeP
private static double computeP(double g, double degreesOfFreedom) Compute the G-test p-value.- Parameters:
g- G-test statistic.degreesOfFreedom- Degrees of freedom.- Returns:
- p-value
-
checkNonZero
Check the array value is non-zero.- Parameters:
value- Valuename- Name of the arrayindex- Index in the array- Throws:
IllegalArgumentException- if the value is zero
-