Class HypergeometricDistribution

java.lang.Object
org.apache.commons.statistics.distribution.HypergeometricDistribution
All Implemented Interfaces:
DiscreteDistribution

public final class HypergeometricDistribution extends Object
Implementation of the hypergeometric distribution.

The probability mass function of \( X \) is:

\[ f(k; N, K, n) = \frac{\binom{K}{k} \binom{N - K}{n-k}}{\binom{N}{n}} \]

for \( N \in \{0, 1, 2, \dots\} \) the population size, \( K \in \{0, 1, \dots, N\} \) the number of success states, \( n \in \{0, 1, \dots, N\} \) the number of samples, \( k \in \{\max(0, n+K-N), \dots, \min(n, K)\} \) the number of successes, and

\[ \binom{a}{b} = \frac{a!}{b! \, (a-b)!} \]

is the binomial coefficient.

See Also:
  • Method Details

    • of

      public static HypergeometricDistribution of(int populationSize, int numberOfSuccesses, int sampleSize)
      Creates a hypergeometric distribution.
      Parameters:
      populationSize - Population size.
      numberOfSuccesses - Number of successes in the population.
      sampleSize - Sample size.
      Returns:
      the distribution
      Throws:
      IllegalArgumentException - if numberOfSuccesses < 0, or populationSize <= 0 or numberOfSuccesses > populationSize, or sampleSize > populationSize.
    • getPopulationSize

      public int getPopulationSize()
      Gets the population size parameter of this distribution.
      Returns:
      the population size.
    • getNumberOfSuccesses

      public int getNumberOfSuccesses()
      Gets the number of successes parameter of this distribution.
      Returns:
      the number of successes.
    • getSampleSize

      public int getSampleSize()
      Gets the sample size parameter of this distribution.
      Returns:
      the sample size.
    • probability

      public double probability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
      Parameters:
      x - Point at which the PMF is evaluated.
      Returns:
      the value of the probability mass function at x.
    • probability

      public double probability(int x0, int x1)
      For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1). The default implementation uses the identity P(x0 < X <= x1) = P(X <= x1) - P(X <= x0)

      Special cases:

      • returns 0.0 if x0 == x1;
      • returns probability(x1) if x0 + 1 == x1;
      Specified by:
      probability in interface DiscreteDistribution
      Parameters:
      x0 - Lower bound (exclusive).
      x1 - Upper bound (inclusive).
      Returns:
      the probability that a random variable with this distribution takes a value between x0 and x1, excluding the lower and including the upper endpoint.
    • logProbability

      public double logProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
      Parameters:
      x - Point at which the PMF is evaluated.
      Returns:
      the logarithm of the value of the probability mass function at x.
    • cumulativeProbability

      public double cumulativeProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
      Parameters:
      x - Point at which the CDF is evaluated.
      Returns:
      the probability that a random variable with this distribution takes a value less than or equal to x.
    • survivalProbability

      public double survivalProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X > x). In other words, this method represents the complementary cumulative distribution function.

      By default, this is defined as 1 - cumulativeProbability(x), but the specific implementation may be more accurate.

      Parameters:
      x - Point at which the survival function is evaluated.
      Returns:
      the probability that a random variable with this distribution takes a value greater than x.
    • inverseCumulativeProbability

      public int inverseCumulativeProbability(double p)
      Computes the quantile function of this distribution. For a random variable X distributed according to this distribution, the returned value is:

      \[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \le x) \ge p\} & \text{for } 0 \lt p \le 1 \\ \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} & \text{for } p = 0 \end{cases} \]

      If the result exceeds the range of the data type int, then Integer.MIN_VALUE or Integer.MAX_VALUE is returned. In this case the result of cumulativeProbability(x) called using the returned p-quantile may not compute the original p.

      The default implementation returns:

      Specified by:
      inverseCumulativeProbability in interface DiscreteDistribution
      Parameters:
      p - Cumulative probability.
      Returns:
      the smallest p-quantile of this distribution (largest 0-quantile for p = 0).
    • inverseSurvivalProbability

      public int inverseSurvivalProbability(double p)
      Computes the inverse survival probability function of this distribution. For a random variable X distributed according to this distribution, the returned value is:

      \[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \gt x) \le p\} & \text{for } 0 \le p \lt 1 \\ \inf \{ x \in \mathbb Z : P(X \gt x) \lt 1 \} & \text{for } p = 1 \end{cases} \]

      If the result exceeds the range of the data type int, then Integer.MIN_VALUE or Integer.MAX_VALUE is returned. In this case the result of survivalProbability(x) called using the returned (1-p)-quantile may not compute the original p.

      By default, this is defined as inverseCumulativeProbability(1 - p), but the specific implementation may be more accurate.

      The default implementation returns:

      Specified by:
      inverseSurvivalProbability in interface DiscreteDistribution
      Parameters:
      p - Cumulative probability.
      Returns:
      the smallest (1-p)-quantile of this distribution (largest 0-quantile for p = 1).
    • getMean

      public double getMean()
      Gets the mean of this distribution.

      For population size \( N \), number of successes \( K \), and sample size \( n \), the mean is:

      \[ n \frac{K}{N} \]

      Returns:
      the mean.
    • getVariance

      public double getVariance()
      Gets the variance of this distribution.

      For population size \( N \), number of successes \( K \), and sample size \( n \), the variance is:

      \[ n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1} \]

      Returns:
      the variance.
    • getSupportLowerBound

      public int getSupportLowerBound()
      Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention, Integer.MIN_VALUE should be substituted for negative infinity.

      For population size \( N \), number of successes \( K \), and sample size \( n \), the lower bound of the support is \( \max \{ 0, n + K - N \} \).

      Returns:
      lower bound of the support
    • getSupportUpperBound

      public int getSupportUpperBound()
      Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention, Integer.MAX_VALUE should be substituted for positive infinity.

      For number of successes \( K \), and sample size \( n \), the upper bound of the support is \( \min \{ n, K \} \).

      Returns:
      upper bound of the support
    • createSampler

      public DiscreteDistribution.Sampler createSampler(org.apache.commons.rng.UniformRandomProvider rng)
      Creates a sampler.
      Specified by:
      createSampler in interface DiscreteDistribution
      Parameters:
      rng - Generator of uniformly distributed numbers.
      Returns:
      a sampler that produces random numbers according this distribution.