Class AliasMethodDiscreteSampler
- java.lang.Object
-
- org.apache.commons.rng.sampling.distribution.AliasMethodDiscreteSampler
-
- All Implemented Interfaces:
DiscreteSampler,SharedStateDiscreteSampler,SharedStateSampler<SharedStateDiscreteSampler>
- Direct Known Subclasses:
AliasMethodDiscreteSampler.SmallTableAliasMethodDiscreteSampler
public class AliasMethodDiscreteSampler extends java.lang.Object implements SharedStateDiscreteSampler
Distribution sampler that uses the Alias method. It can be used to sample fromnvalues each with an associated probability. If all unique items are assigned the same probability it is more efficient to use theDiscreteUniformSampler.This implementation is based on the detailed explanation of the alias method by Keith Schartz and implements Vose's algorithm.
-
Vose, M.D., A linear algorithm for generating random numbers with a given distribution, IEEE Transactions on Software Engineering, 17, 972-975, 1991.
The algorithm will sample values in
O(1)time after a pre-processing step ofO(n)time.The alias tables are constructed using fraction probabilities with an assumed denominator of 253. In the generic case sampling uses
UniformRandomProvider.nextInt(int)and the upper 53-bits fromUniformRandomProvider.nextLong().Zero padding the input probabilities can be used to make more sampling more efficient. Any zero entry will always be aliased removing the requirement to compute a
long. Increased sampling speed comes at the cost of increased storage space. The algorithm requires approximately 12 bytes of storage per input probability, that isn * 12for sizen. Zero-padding only requires 4 bytes of storage per padded value as the probability is known to be zero. A table can be padded to a power of 2 using the utility functionof(UniformRandomProvider, double[], int)to construct the sampler.An optimisation is performed for small table sizes that are a power of 2. In this case the sampling uses 1 or 2 calls from
UniformRandomProvider.nextInt()to generate up to 64-bits for creation of an 11-bit index and 53-bits for thelong. This optimisation requires a generator with a high cycle length for the lower order bits.Larger table sizes that are a power of 2 will benefit from fast algorithms for
UniformRandomProvider.nextInt(int)that exploit the power of 2.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classAliasMethodDiscreteSampler.SmallTableAliasMethodDiscreteSamplerSample from the computed tables exploiting the small power-of-two table size.
-
Field Summary
Fields Modifier and Type Field Description protected int[]aliasThe alias table.private static doubleCONVERT_TO_NUMERATORThe multiplier to convert adoubleprobability in the range[0, 1]to the numerator of a fraction with denominator 253.private static intDEFAULT_ALPHAThe default alpha factor for zero-padding an input probability table.private static intMAX_SMALL_POWER_2_SIZEThe maximum size of the small alias table.private static longONE_AS_NUMERATORThe value 1.0 represented as the numerator of a fraction with denominator 253.protected long[]probabilityThe probability table.protected UniformRandomProviderrngUnderlying source of randomness.private static doubleZEROThe value zero for adouble.
-
Constructor Summary
Constructors Constructor Description AliasMethodDiscreteSampler(UniformRandomProvider rng, long[] probability, int[] alias)Creates a sampler.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static intcomputeSize(int length, int alpha)Compute the size after padding.private static intfillRemainingIndices(int length, int[] indices, int small)Allocate the remaining indices from zero padding as small probabilities.private static voidfillTable(long[] probability, int[] alias, int[] indices, int start, int end)Fill the tables using unpaired items that are in the range betweenstartinclusive andendexclusive.private static intfindLastNonZeroIndex(double[] probabilities)Find the last non-zero index in the probabilities.private static booleanisSmallPowerOf2(int n)Checks if the size is a small power of 2 so can be supported by theAliasMethodDiscreteSampler.SmallTableAliasMethodDiscreteSampler.static SharedStateDiscreteSamplerof(UniformRandomProvider rng, double[] probabilities)Creates a sampler.static SharedStateDiscreteSamplerof(UniformRandomProvider rng, double[] probabilities, int alpha)Creates a sampler.intsample()Creates anintsample.java.lang.StringtoString()SharedStateDiscreteSamplerwithUniformRandomProvider(UniformRandomProvider rng)Create a new instance of the sampler with the same underlying state using the given uniform random provider as the source of randomness.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.commons.rng.sampling.distribution.DiscreteSampler
samples, samples
-
-
-
-
Field Detail
-
DEFAULT_ALPHA
private static final int DEFAULT_ALPHA
The default alpha factor for zero-padding an input probability table. The default value will pad the probabilities by to the next power-of-2.- See Also:
- Constant Field Values
-
ZERO
private static final double ZERO
The value zero for adouble.- See Also:
- Constant Field Values
-
ONE_AS_NUMERATOR
private static final long ONE_AS_NUMERATOR
The value 1.0 represented as the numerator of a fraction with denominator 253.- See Also:
- Constant Field Values
-
CONVERT_TO_NUMERATOR
private static final double CONVERT_TO_NUMERATOR
The multiplier to convert adoubleprobability in the range[0, 1]to the numerator of a fraction with denominator 253.- See Also:
- Constant Field Values
-
MAX_SMALL_POWER_2_SIZE
private static final int MAX_SMALL_POWER_2_SIZE
The maximum size of the small alias table. This is 211.- See Also:
- Constant Field Values
-
rng
protected final UniformRandomProvider rng
Underlying source of randomness.
-
probability
protected final long[] probability
The probability table. During sampling a random index into this table is selected. A random probability is compared to the value at this index: if lower then the sample is the index; if higher then the sample uses the corresponding entry in the alias table.This has entries up to the last non-zero element since there is no need to store probabilities of zero. This is an optimisation for zero-padded input. Any zero value will always be aliased so any look-up index outside this table always uses the alias.
Note that a uniform double in the range [0,1) can be generated using 53-bits from a long to sample all the dyadic rationals with a denominator of 253 (e.g. see org.apache.commons.rng.core.utils.NumberFactory.makeDouble(long)). To avoid computation of a double and comparison to the probability as a double the probabilities are stored as 53-bit longs to use integer arithmetic. This is the equivalent of storing the numerator of a fraction with the denominator of 253.
During conversion of the probability to a double it is rounded up to the next integer value. This ensures the functionality of comparing a uniform deviate distributed evenly on the interval 1/2^53 to the unevenly distributed probability is equivalent, i.e. a uniform deviate is either below the probability or above it:
Uniform deviate 1/2^53 2/2^53 3/2^53 4/2^53 --|---------|---------|---------|--- ^ | probability ^ | rounded upRound-up ensures a non-zero probability is always non-zero and zero probability remains zero. Thus any item with a non-zero input probability can always be sampled, and a zero input probability cannot be sampled.
- See Also:
- Dyadic rational
-
alias
protected final int[] alias
The alias table. During sampling if the random probability is not below the entry in the probability table then the sample is the alias.
-
-
Constructor Detail
-
AliasMethodDiscreteSampler
AliasMethodDiscreteSampler(UniformRandomProvider rng, long[] probability, int[] alias)
Creates a sampler.The input parameters are not validated and must be correctly computed alias tables.
- Parameters:
rng- Generator of uniformly distributed random numbers.probability- Probability table.alias- Alias table.
-
-
Method Detail
-
sample
public int sample()
Creates anintsample.- Specified by:
samplein interfaceDiscreteSampler- Returns:
- a sample.
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
withUniformRandomProvider
public SharedStateDiscreteSampler withUniformRandomProvider(UniformRandomProvider rng)
Create a new instance of the sampler with the same underlying state using the given uniform random provider as the source of randomness.- Specified by:
withUniformRandomProviderin interfaceSharedStateSampler<SharedStateDiscreteSampler>- Parameters:
rng- Generator of uniformly distributed random numbers.- Returns:
- the sampler
-
of
public static SharedStateDiscreteSampler of(UniformRandomProvider rng, double[] probabilities)
Creates a sampler.The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.
Where possible this method zero-pads the probabilities so the length is the next power-of-two. Padding is bounded by the upper limit on the size of an array.
To avoid zero-padding use the
of(UniformRandomProvider, double[], int)method with a negativealphafactor.- Parameters:
rng- Generator of uniformly distributed random numbers.probabilities- The list of probabilities.- Returns:
- the sampler
- Throws:
java.lang.IllegalArgumentException- ifprobabilitiesis null or empty, a probability is negative, infinite orNaN, or the sum of all probabilities is not strictly positive.- See Also:
of(UniformRandomProvider, double[], int)
-
of
public static SharedStateDiscreteSampler of(UniformRandomProvider rng, double[] probabilities, int alpha)
Creates a sampler.The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.
Where possible this method zero-pads the probabilities to improve sampling efficiency. Padding is bounded by the upper limit on the size of an array and controlled by the
alphaargument. Set to negative to disable padding.For each zero padded value an entry is added to the tables which is always aliased. This can be sampled with fewer bits required from the
UniformRandomProvider. Increasing the padding of zeros increases the chance of using this fast path to selecting a sample. The penalty is two-fold: initialisation is bounded byO(n)time withnthe size after padding; an additional memory cost of 4 bytes per padded value.Zero padding to any length improves performance; using a power of 2 allows the index into the tables to be more efficiently generated. The argument
alphacontrols the level of padding. Positive values ofalpharepresent a scale factor in powers of 2. The size of the input array will be increased by a factor of 2alpha and then rounded-up to the next power of 2. Padding is bounded by the upper limit on the size of an array.The chance of executing the slow path is upper bounded at 2-alpha when padding is enabled. Each successive doubling of padding will have diminishing performance gains.
- Parameters:
rng- Generator of uniformly distributed random numbers.probabilities- The list of probabilities.alpha- The alpha factor controlling the zero padding.- Returns:
- the sampler
- Throws:
java.lang.IllegalArgumentException- ifprobabilitiesis null or empty, a probability is negative, infinite orNaN, or the sum of all probabilities is not strictly positive.
-
fillRemainingIndices
private static int fillRemainingIndices(int length, int[] indices, int small)Allocate the remaining indices from zero padding as small probabilities. The number to add is from the length of the probability array to the length of the padded probability array (which is the same length as the indices array).- Parameters:
length- Length of probability array.indices- Indices.small- Number of small indices.- Returns:
- the updated number of small indices
-
findLastNonZeroIndex
private static int findLastNonZeroIndex(double[] probabilities)
Find the last non-zero index in the probabilities. This may be smaller than the input length if the probabilities were already padded.- Parameters:
probabilities- The list of probabilities.- Returns:
- the index
-
computeSize
private static int computeSize(int length, int alpha)Compute the size after padding. A value ofalpha < 0disables padding. Otherwise the length will be increased by 2alpha rounded-up to the next power of 2.- Parameters:
length- Length of probability array.alpha- The alpha factor controlling the zero padding.- Returns:
- the padded size
-
fillTable
private static void fillTable(long[] probability, int[] alias, int[] indices, int start, int end)Fill the tables using unpaired items that are in the range betweenstartinclusive andendexclusive.Anything left must fill the entire section so the probability table is set to 1 and there is no alias. This will occur for 1/n samples, i.e. the last remaining unpaired probability. Note: When the tables are zero-padded the remaining indices are from an input probability that is above zero so the index will be allowed in the truncated probability array and no index-out-of-bounds exception will occur.
- Parameters:
probability- Probability table.alias- Alias table.indices- Unpaired indices.start- Start position.end- End position.
-
isSmallPowerOf2
private static boolean isSmallPowerOf2(int n)
Checks if the size is a small power of 2 so can be supported by theAliasMethodDiscreteSampler.SmallTableAliasMethodDiscreteSampler.- Parameters:
n- Size of the alias table.- Returns:
- true if supported by
AliasMethodDiscreteSampler.SmallTableAliasMethodDiscreteSampler
-
-