Class ZipfDistribution
- java.lang.Object
-
- org.apache.commons.math3.distribution.AbstractIntegerDistribution
-
- org.apache.commons.math3.distribution.ZipfDistribution
-
- All Implemented Interfaces:
java.io.Serializable,IntegerDistribution
public class ZipfDistribution extends AbstractIntegerDistribution
Implementation of the Zipf distribution.Parameters: For a random variable
Xwhose values are distributed according to this distribution, the probability mass function is given byP(X = k) = H(N,s) * 1 / k^s for
k = 1,2,...,N.H(N,s)is the normalizing constant which corresponds to the generalized harmonic number of order N of s.Nis the number of elementssis the exponent
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static classZipfDistribution.ZipfRejectionInversionSamplerUtility class implementing a rejection inversion sampling method for a discrete, bounded Zipf distribution that is based on the method described in
-
Field Summary
Fields Modifier and Type Field Description private doubleexponentExponent parameter of the distribution.private intnumberOfElementsNumber of elements.private doublenumericalMeanCached numerical meanprivate booleannumericalMeanIsCalculatedWhether or not the numerical mean has been calculatedprivate doublenumericalVarianceCached numerical varianceprivate booleannumericalVarianceIsCalculatedWhether or not the numerical variance has been calculatedprivate ZipfDistribution.ZipfRejectionInversionSamplersamplerThe sampler to be used for the sample() methodprivate static longserialVersionUIDSerializable version identifier.-
Fields inherited from class org.apache.commons.math3.distribution.AbstractIntegerDistribution
random, randomData
-
-
Constructor Summary
Constructors Constructor Description ZipfDistribution(int numberOfElements, double exponent)Create a new Zipf distribution with the given number of elements and exponent.ZipfDistribution(RandomGenerator rng, int numberOfElements, double exponent)Creates a Zipf distribution.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected doublecalculateNumericalMean()Used bygetNumericalMean().protected doublecalculateNumericalVariance()Used bygetNumericalVariance().doublecumulativeProbability(int x)For a random variableXwhose values are distributed according to this distribution, this method returnsP(X <= x).private doublegeneralizedHarmonic(int n, double m)Calculates the Nth generalized harmonic number.doublegetExponent()Get the exponent characterizing the distribution.intgetNumberOfElements()Get the number of elements (e.g.doublegetNumericalMean()Use this method to get the numerical value of the mean of this distribution.doublegetNumericalVariance()Use this method to get the numerical value of the variance of this distribution.intgetSupportLowerBound()Access the lower bound of the support.intgetSupportUpperBound()Access the upper bound of the support.booleanisSupportConnected()Use this method to get information about whether the support is connected, i.e.doublelogProbability(int x)For a random variableXwhose values are distributed according to this distribution, this method returnslog(P(X = x)), wherelogis the natural logarithm.doubleprobability(int x)For a random variableXwhose values are distributed according to this distribution, this method returnsP(X = x).intsample()Generate a random value sampled from this distribution.-
Methods inherited from class org.apache.commons.math3.distribution.AbstractIntegerDistribution
cumulativeProbability, inverseCumulativeProbability, reseedRandomGenerator, sample, solveInverseCumulativeProbability
-
-
-
-
Field Detail
-
serialVersionUID
private static final long serialVersionUID
Serializable version identifier.- See Also:
- Constant Field Values
-
numberOfElements
private final int numberOfElements
Number of elements.
-
exponent
private final double exponent
Exponent parameter of the distribution.
-
numericalMean
private double numericalMean
Cached numerical mean
-
numericalMeanIsCalculated
private boolean numericalMeanIsCalculated
Whether or not the numerical mean has been calculated
-
numericalVariance
private double numericalVariance
Cached numerical variance
-
numericalVarianceIsCalculated
private boolean numericalVarianceIsCalculated
Whether or not the numerical variance has been calculated
-
sampler
private transient ZipfDistribution.ZipfRejectionInversionSampler sampler
The sampler to be used for the sample() method
-
-
Constructor Detail
-
ZipfDistribution
public ZipfDistribution(int numberOfElements, double exponent)Create a new Zipf distribution with the given number of elements and exponent.Note: this constructor will implicitly create an instance of
Well19937cas random generator to be used for sampling only (seesample()andAbstractIntegerDistribution.sample(int)). In case no sampling is needed for the created distribution, it is advised to passnullas random generator via the appropriate constructors to avoid the additional initialisation overhead.- Parameters:
numberOfElements- Number of elements.exponent- Exponent.- Throws:
NotStrictlyPositiveException- ifnumberOfElements <= 0orexponent <= 0.
-
ZipfDistribution
public ZipfDistribution(RandomGenerator rng, int numberOfElements, double exponent) throws NotStrictlyPositiveException
Creates a Zipf distribution.- Parameters:
rng- Random number generator.numberOfElements- Number of elements.exponent- Exponent.- Throws:
NotStrictlyPositiveException- ifnumberOfElements <= 0orexponent <= 0.- Since:
- 3.1
-
-
Method Detail
-
getNumberOfElements
public int getNumberOfElements()
Get the number of elements (e.g. corpus size) for the distribution.- Returns:
- the number of elements
-
getExponent
public double getExponent()
Get the exponent characterizing the distribution.- Returns:
- the exponent
-
probability
public double probability(int x)
For a random variableXwhose values are distributed according to this distribution, this method returnsP(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.- Parameters:
x- the point at which the PMF is evaluated- Returns:
- the value of the probability mass function at
x
-
logProbability
public double logProbability(int x)
For a random variableXwhose values are distributed according to this distribution, this method returnslog(P(X = x)), wherelogis the natural logarithm. In other words, this method represents the logarithm of the probability mass function (PMF) for the distribution. Note that due to the floating point precision and under/overflow issues, this method will for some distributions be more precise and faster than computing the logarithm ofIntegerDistribution.probability(int).The default implementation simply computes the logarithm of
probability(x).- Overrides:
logProbabilityin classAbstractIntegerDistribution- Parameters:
x- the point at which the PMF is evaluated- Returns:
- the logarithm of the value of the probability mass function at
x
-
cumulativeProbability
public double cumulativeProbability(int x)
For a random variableXwhose values are distributed according to this distribution, this method returnsP(X <= x). In other words, this method represents the (cumulative) distribution function (CDF) for this distribution.- Parameters:
x- the point at which the CDF is evaluated- Returns:
- the probability that a random variable with this
distribution takes a value less than or equal to
x
-
getNumericalMean
public double getNumericalMean()
Use this method to get the numerical value of the mean of this distribution. For number of elementsNand exponents, the mean isHs1 / Hs, whereHs1 = generalizedHarmonic(N, s - 1),Hs = generalizedHarmonic(N, s).
- Returns:
- the mean or
Double.NaNif it is not defined
-
calculateNumericalMean
protected double calculateNumericalMean()
Used bygetNumericalMean().- Returns:
- the mean of this distribution
-
getNumericalVariance
public double getNumericalVariance()
Use this method to get the numerical value of the variance of this distribution. For number of elementsNand exponents, the mean is(Hs2 / Hs) - (Hs1^2 / Hs^2), whereHs2 = generalizedHarmonic(N, s - 2),Hs1 = generalizedHarmonic(N, s - 1),Hs = generalizedHarmonic(N, s).
- Returns:
- the variance (possibly
Double.POSITIVE_INFINITYorDouble.NaNif it is not defined)
-
calculateNumericalVariance
protected double calculateNumericalVariance()
Used bygetNumericalVariance().- Returns:
- the variance of this distribution
-
generalizedHarmonic
private double generalizedHarmonic(int n, double m)Calculates the Nth generalized harmonic number. See Harmonic Series.- Parameters:
n- Term in the series to calculate (must be larger than 1)m- Exponent (special casem = 1is the harmonic series).- Returns:
- the nth generalized harmonic number.
-
getSupportLowerBound
public int getSupportLowerBound()
Access the lower bound of the support. This method must return the same value asinverseCumulativeProbability(0). In other words, this method must return
The lower bound of the support is always 1 no matter the parameters.inf {x in Z | P(X <= x) > 0}.- Returns:
- lower bound of the support (always 1)
-
getSupportUpperBound
public int getSupportUpperBound()
Access the upper bound of the support. This method must return the same value asinverseCumulativeProbability(1). In other words, this method must return
The upper bound of the support is the number of elements.inf {x in R | P(X <= x) = 1}.- Returns:
- upper bound of the support
-
isSupportConnected
public boolean isSupportConnected()
Use this method to get information about whether the support is connected, i.e. whether all integers between the lower and upper bound of the support are included in the support. The support of this distribution is connected.- Returns:
true
-
sample
public int sample()
Generate a random value sampled from this distribution. The default implementation uses the inversion method.- Specified by:
samplein interfaceIntegerDistribution- Overrides:
samplein classAbstractIntegerDistribution- Returns:
- a random value
-
-