Class Percentile
- All Implemented Interfaces:
Serializable,UnivariateStatistic,MathArrays.Function
- Direct Known Subclasses:
Median
There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
- Let
nbe the length of the (sorted) array and0 invalid input: '<' p invalid input: '<'= 100be the desired percentile. - If
n = 1return the unique array element (regardless of the value ofp); otherwise - Compute the estimated percentile position
pos = p * (n + 1) / 100and the difference,dbetweenposandfloor(pos)(i.e. the fractional part ofpos). - If
pos invalid input: '<' 1return the smallest element in the array. - Else if
pos >= nreturn the largest element in the array. - Else let
lowerbe the element in positionfloor(pos)in the array and letupperbe the next element in the array. Returnlower + d * (upper - lower)
To compute percentiles, the data must be at least partially ordered. Input
arrays are copied and recursively partitioned using an ordering definition.
The ordering used by Arrays.sort(double[]) is the one determined
by Double.compareTo(Double). This ordering makes
Double.NaN larger than any other value (including
Double.POSITIVE_INFINITY). Therefore, for example, the median
(50th percentile) of
{0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.
Since percentile estimation usually involves interpolation between array
elements, arrays containing NaN or infinite values will often
result in NaN or infinite values returned.
Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).
Since 2.2, Percentile uses only selection instead of complete sorting
and caches selection algorithm state between calls to the various
evaluate methods. This greatly improves efficiency, both for a single
percentile and multiple percentile computations. To maximize performance when
multiple percentiles are computed based on the same data, users should set the
data array once using either one of the evaluate(double[], double) or
setData(double[]) methods and thereafter evaluate(double)
with just the percentile provided.
Note that this implementation is not synchronized. If
multiple threads access an instance of this class concurrently, and at least
one of the threads invokes the increment() or
clear() method, it must be synchronized externally.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumAn enum for various estimation strategies of a percentile referred in wikipedia on quantile with the names of enum matching those of types mentioned in wikipedia. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int[]Cached pivots.private final Percentile.EstimationTypeAny of thePercentile.EstimationTypes such asCMcan be used.private final KthSelectorDefault KthSelector used with default pivoting strategyprivate static final intMaximum number of partitioning pivots cached (each level double the number of pivots).private final NaNStrategyNaN Handling of the input as defined byNaNStrategyprivate static final intMaximum number of cached pivots in the pivots cached arrayprivate doubleDetermines what percentile is computed when evaluate() is activated with no quantile argumentprivate static final longSerializable version identifier -
Constructor Summary
ConstructorsModifierConstructorDescriptionConstructs a Percentile with the following defaults.Percentile(double quantile) Constructs a Percentile with the specific quantile value and the following default method type:Percentile.EstimationType.LEGACYdefault NaN strategy:NaNStrategy.REMOVEDa Kth Selector :KthSelectorprotectedPercentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector) Constructs a Percentile with the specific quantile value,Percentile.EstimationType,NaNStrategyandKthSelector.Percentile(Percentile original) Copy constructor, creates a newPercentileidentical to theoriginal -
Method Summary
Modifier and TypeMethodDescriptioncopy()Returns a copy of the statistic with the same internal state.static voidcopy(Percentile source, Percentile dest) Deprecated.private static double[]copyOf(double[] values, int begin, int length) Make a copy of the array for the slice defined by array part from [begin, begin+length)doubleevaluate(double p) Returns the result of evaluating the statistic over the stored data.doubleevaluate(double[] values, double p) Returns an estimate of thepth percentile of the values in thevaluesarray.doubleevaluate(double[] values, int start, int length) Returns an estimate of thequantileth percentile of the designated values in thevaluesarray.doubleevaluate(double[] values, int begin, int length, double p) Returns an estimate of thepth percentile of the values in thevaluesarray, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.Get the estimationtypeused for computation.Get thekthSelectorused for computation.Get theNaN Handlingstrategy used for computation.Get thePivotingStrategyInterfaceused in KthSelector for computation.private int[]getPivots(double[] values) Get pivots which is either cached or a newly created onedoubleReturns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).protected double[]getWorkArray(double[] values, int begin, int length) Get the work array to operate.(package private) intmedianOf3(double[] work, int begin, int end) Deprecated.Please refrain from using this method (as it wont take effect) and instead usewithKthSelector(newKthSelector)if required.private static double[]removeAndSlice(double[] values, int begin, int length, double removedValue) Remove the occurrence of a given value in a copied slice of array defined by the array part from [begin, begin+length).private static double[]replaceAndSlice(double[] values, int begin, int length, double original, double replacement) Replace every occurrence of a given value with a replacement value in a copied slice of array defined by array part from [begin, begin+length).voidsetData(double[] values) Set the data array.voidsetData(double[] values, int begin, int length) Set the data array.voidsetQuantile(double p) Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).withEstimationType(Percentile.EstimationType newEstimationType) Build a new instance similar to the current one except for theestimation type.withKthSelector(KthSelector newKthSelector) Build a new instance similar to the current one except for thekthSelectorinstance specifically set.withNaNStrategy(NaNStrategy newNaNStrategy) Build a new instance similar to the current one except for theNaN handlingstrategy.Methods inherited from class org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic
evaluate, evaluate, getData, getDataRef, test, test, test, test
-
Field Details
-
serialVersionUID
private static final long serialVersionUIDSerializable version identifier- See Also:
-
MAX_CACHED_LEVELS
private static final int MAX_CACHED_LEVELSMaximum number of partitioning pivots cached (each level double the number of pivots).- See Also:
-
PIVOTS_HEAP_LENGTH
private static final int PIVOTS_HEAP_LENGTHMaximum number of cached pivots in the pivots cached array- See Also:
-
kthSelector
Default KthSelector used with default pivoting strategy -
estimationType
Any of thePercentile.EstimationTypes such asCMcan be used. -
nanStrategy
NaN Handling of the input as defined byNaNStrategy -
quantile
private double quantileDetermines what percentile is computed when evaluate() is activated with no quantile argument -
cachedPivots
private int[] cachedPivotsCached pivots.
-
-
Constructor Details
-
Percentile
public Percentile()Constructs a Percentile with the following defaults.- default quantile: 50.0, can be reset with
setQuantile(double) - default estimation type:
Percentile.EstimationType.LEGACY, can be reset withwithEstimationType(EstimationType) - default NaN strategy:
NaNStrategy.REMOVED, can be reset withwithNaNStrategy(NaNStrategy) - a KthSelector that makes use of
MedianOf3PivotingStrategy, can be reset withwithKthSelector(KthSelector)
- default quantile: 50.0, can be reset with
-
Percentile
Constructs a Percentile with the specific quantile value and the following- default method type:
Percentile.EstimationType.LEGACY - default NaN strategy:
NaNStrategy.REMOVED - a Kth Selector :
KthSelector
- Parameters:
quantile- the quantile- Throws:
MathIllegalArgumentException- if p is not greater than 0 and less than or equal to 100
- default method type:
-
Percentile
Copy constructor, creates a newPercentileidentical to theoriginal- Parameters:
original- thePercentileinstance to copy- Throws:
NullArgumentException- if original is null
-
Percentile
protected Percentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector) throws MathIllegalArgumentException Constructs a Percentile with the specific quantile value,Percentile.EstimationType,NaNStrategyandKthSelector.- Parameters:
quantile- the quantile to be computedestimationType- one of the percentileestimation typesnanStrategy- one ofNaNStrategyto handle with NaNskthSelector- aKthSelectorto use for pivoting during search- Throws:
MathIllegalArgumentException- if p is not within (0,100]NullArgumentException- if type or NaNStrategy passed is null
-
-
Method Details
-
setData
public void setData(double[] values) Set the data array.The stored value is a copy of the parameter array, not the array itself.
- Overrides:
setDatain classAbstractUnivariateStatistic- Parameters:
values- data array to store (may be null to remove stored data)- See Also:
-
setData
Set the data array. The input array is copied, not referenced.- Overrides:
setDatain classAbstractUnivariateStatistic- Parameters:
values- data array to storebegin- the index of the first element to includelength- the number of elements to include- Throws:
MathIllegalArgumentException- if values is null or the indices are not valid- See Also:
-
evaluate
Returns the result of evaluating the statistic over the stored data.The stored array is the one which was set by previous calls to
setData(double[])- Parameters:
p- the percentile value to compute- Returns:
- the value of the statistic applied to the stored data
- Throws:
MathIllegalArgumentException- if p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
-
evaluate
Returns an estimate of thepth percentile of the values in thevaluesarray.Calls to this method do not modify the internal
quantilestate of this statistic.- Returns
Double.NaNifvalueshas length0 - Returns (for any value of
p)values[0]ifvalueshas length1 - Throws
MathIllegalArgumentExceptionifvaluesis null or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- input array of valuesp- the percentile value to compute- Returns:
- the percentile value or Double.NaN if the array is empty
- Throws:
MathIllegalArgumentException- ifvaluesis null or p is invalid
- Returns
-
evaluate
Returns an estimate of thequantileth percentile of the designated values in thevaluesarray. The quantile estimated is determined by thequantileproperty.- Returns
Double.NaNiflength = 0 - Returns (for any value of
quantile)values[begin]iflength = 1 - Throws
MathIllegalArgumentExceptionifvaluesis null, orstartorlengthis invalid
See
Percentilefor a description of the percentile estimation algorithm used.- Specified by:
evaluatein interfaceMathArrays.Function- Specified by:
evaluatein interfaceUnivariateStatistic- Specified by:
evaluatein classAbstractUnivariateStatistic- Parameters:
values- the input arraystart- index of the first array element to includelength- the number of elements to include- Returns:
- the percentile value
- Throws:
MathIllegalArgumentException- if the parameters are not valid
- Returns
-
evaluate
public double evaluate(double[] values, int begin, int length, double p) throws MathIllegalArgumentException Returns an estimate of thepth percentile of the values in thevaluesarray, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.Calls to this method do not modify the internal
quantilestate of this statistic.- Returns
Double.NaNiflength = 0 - Returns (for any value of
p)values[begin]iflength = 1 - Throws
MathIllegalArgumentExceptionifvaluesis null ,beginorlengthis invalid, orpis not a valid quantile value (p must be greater than 0 and less than or equal to 100)
See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- array of input valuesbegin- the first (0-based) element to include in the computationlength- the number of array elements to includep- the percentile to compute- Returns:
- the percentile value
- Throws:
MathIllegalArgumentException- if the parameters are not valid or the input array is null
- Returns
-
medianOf3
Deprecated.Please refrain from using this method (as it wont take effect) and instead usewithKthSelector(newKthSelector)if required.Select a pivot index as the median of threeNote: With the effect of allowing
KthSelectorto be set onPercentileinstances(thus indirectly) this method wont take effect any more and hence is unsupported.invalid reference
PivotingStrategy- Parameters:
work- data arraybegin- index of the first element of the sliceend- index after the last element of the slice- Returns:
- the index of the median element chosen between the first, the middle and the last element of the array slice
-
getQuantile
public double getQuantile()Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).- Returns:
- quantile set while construction or
setQuantile(double)
-
setQuantile
Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).- Parameters:
p- a value between 0 invalid input: '<' p invalid input: '<'= 100- Throws:
MathIllegalArgumentException- if p is not greater than 0 and less than or equal to 100
-
copy
Returns a copy of the statistic with the same internal state.- Specified by:
copyin interfaceUnivariateStatistic- Specified by:
copyin classAbstractUnivariateStatistic- Returns:
- a copy of the statistic
-
copy
@Deprecated public static void copy(Percentile source, Percentile dest) throws MathUnsupportedOperationException Deprecated.as of 3.4 this method does not work anymore, as it fails to copy internal states between instances configured with differentestimation type,NaN handling strategiesandkthSelector, it therefore always throwMathUnsupportedOperationExceptionCopies source to dest.- Parameters:
source- Percentile to copydest- Percentile to copy to- Throws:
MathUnsupportedOperationException- always thrown since 3.4
-
getWorkArray
protected double[] getWorkArray(double[] values, int begin, int length) Get the work array to operate. Makes use of priorstoredDataif it exists or else do a check on NaNs and copy a subset of the array defined by begin and length parameters. The setnanStrategywill be used to either retain/remove/replace any NaNs present before returning the resultant array.- Parameters:
values- the array of numbersbegin- index to start reading the arraylength- the length of array to be read from the begin index- Returns:
- work array sliced from values in the range [begin,begin+length)
- Throws:
MathIllegalArgumentException- if values or indices are invalid
-
copyOf
private static double[] copyOf(double[] values, int begin, int length) Make a copy of the array for the slice defined by array part from [begin, begin+length)- Parameters:
values- the input arraybegin- start index of the array to includelength- number of elements to include from begin- Returns:
- copy of a slice of the original array
-
replaceAndSlice
private static double[] replaceAndSlice(double[] values, int begin, int length, double original, double replacement) Replace every occurrence of a given value with a replacement value in a copied slice of array defined by array part from [begin, begin+length).- Parameters:
values- the input arraybegin- start index of the array to includelength- number of elements to include from beginoriginal- the value to be replaced withreplacement- the value to be used for replacement- Returns:
- the copy of sliced array with replaced values
-
removeAndSlice
private static double[] removeAndSlice(double[] values, int begin, int length, double removedValue) Remove the occurrence of a given value in a copied slice of array defined by the array part from [begin, begin+length).- Parameters:
values- the input arraybegin- start index of the array to includelength- number of elements to include from beginremovedValue- the value to be removed from the sliced array- Returns:
- the copy of the sliced array after removing the removedValue
-
getPivots
private int[] getPivots(double[] values) Get pivots which is either cached or a newly created one- Parameters:
values- array containing the input numbers- Returns:
- cached pivots or a newly created one
-
getEstimationType
Get the estimationtypeused for computation.- Returns:
- the
estimationTypeset
-
withEstimationType
Build a new instance similar to the current one except for theestimation type.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newEstimationType- estimation type for the new instance- Returns:
- a new instance, with changed estimation type
- Throws:
NullArgumentException- when newEstimationType is null
-
getNaNStrategy
Get theNaN Handlingstrategy used for computation.- Returns:
NaN Handlingstrategy set during construction
-
withNaNStrategy
Build a new instance similar to the current one except for theNaN handlingstrategy.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newNaNStrategy- NaN strategy for the new instance- Returns:
- a new instance, with changed NaN handling strategy
- Throws:
NullArgumentException- when newNaNStrategy is null
-
getKthSelector
Get thekthSelectorused for computation.- Returns:
- the
kthSelectorset
-
getPivotingStrategy
Get thePivotingStrategyInterfaceused in KthSelector for computation.- Returns:
- the pivoting strategy set
-
withKthSelector
Build a new instance similar to the current one except for thekthSelectorinstance specifically set.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(newKthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newKthSelector- KthSelector for the new instance- Returns:
- a new instance, with changed KthSelector
- Throws:
NullArgumentException- when newKthSelector is null
-
estimation type,NaN handling strategiesandkthSelector, it therefore always throwMathUnsupportedOperationException