Class DD
- All Implemented Interfaces:
Serializable,Addition<DD>,Multiplication<DD>,NativeOperators<DD>
A double-double is an unevaluated sum of two IEEE double precision numbers capable of
representing at least 106 bits of significand. A normalized double-double number (x, xx)
satisfies the condition that the parts are non-overlapping in magnitude such that:
|x| > |xx| x == x + xx
This implementation assumes a normalized representation during operations on a DD
number and computes results as a normalized representation. Any double-double number
can be normalized by summation of the parts (see ofSum).
Note that the number (x, xx) may also be referred to using the labels high and low
to indicate the magnitude of the parts as
(xhi, xlo), or using a numerical suffix for the
parts as (x0, x1). The numerical suffix is
typically used when the number has an arbitrary number of parts.
The double-double class is immutable.
Construction
Factory methods to create a DD that are exact use the prefix of. Methods
that create the closest possible representation use the prefix from. These methods
may suffer a possible loss of precision during conversion.
Primitive values of type double, int and long are
converted exactly to a DD.
The DD class can also be created as the result of an arithmetic operation on a pair
of double operands. The resulting DD has the IEEE754 double result
of the operation in the first part, and the second part contains the round-off lost from the
operation due to rounding. Construction using add (+), subtract (-) and
multiply (*) operators are exact. Construction using division (/) may be
inexact if the quotient is not representable.
Note that it is more efficient to create a DD from a double operation than
to create two DD values and combine them with the same operation. The result will be
the same for add, subtract and multiply but may lose precision for divide.
// Inefficient
DD a = DD.of(1.23).add(DD.of(4.56));
// Optimal
DD b = DD.ofSum(1.23, 4.56);
// Inefficient and may lose precision
DD c = DD.of(1.23).divide(DD.of(4.56));
// Optimal
DD d = DD.fromQuotient(1.23, 4.56);
It is not possible to directly specify the two parts of the number.
The two parts must be added using ofSum.
If the two parts already represent a number (x, xx) such that x == x + xx
then the magnitudes of the parts will be unchanged; any signed zeros may be subject to a sign
change.
Primitive operands
Operations are provided using a DD operand or a double operand.
Implicit type conversion allows methods with a double operand to be used
with other primitives such as int or long. Note that casting of a long
to a double may result in loss of precision.
To maintain the full precision of a long first convert the value to a DD using
of(long) and use the same arithmetic operation using the DD operand.
Accuracy
Add and multiply operations using two double values operands are computed to an
exact DD result (see ofSum and
ofProduct). Operations involving a DD and another
operand, either double or DD, are not exact.
This class is not intended to perform exact arithmetic. Arbitrary precision arithmetic is
available using BigDecimal. Single operations will compute the DD result within
a tolerance of the 106-bit exact result. This far exceeds the accuracy of double
arithmetic. The reduced accuracy is a compromise to deliver increased performance.
The class is intended to reduce error in equivalent double arithmetic operations where
the double valued result is required to high accuracy. Although it
is possible to reduce error to 2-106 for all operations, the additional computation
would impact performance and would require multiple chained operations to potentially
observe a different result when the final DD is converted to a double.
Canonical representation
The double-double number is the sum of its parts. The canonical representation of the
number is the explicit value of the parts. The toString() method is provided to
convert to a String representation of the parts formatted as a tuple.
The class implements equals(Object) and hashCode() and allows usage as
a key in a Set or Map. Equality requires binary equivalence of the parts. Note that
representations of zero using different combinations of +/- 0.0 are not considered equal.
Also note that many non-normalized double-double numbers can represent the same number.
Double-double numbers can be normalized before operations that involve equals(Object)
by adding the parts; this is exact for a finite sum
and provides equality support for non-zero numbers. Alternatively exact numerical equality
and comparisons are supported by conversion to a BigDecimal
representation. Note that BigDecimal does not support non-finite values.
Overflow, underflow and non-finite support
A double-double number is limited to the same finite range as a double
(4.9E-324 to 1.7976931348623157E308). This class is intended for use when
the ultimate result is finite and intermediate values do not approach infinity or zero.
This implementation does not support IEEE standards for handling infinite and NaN when used
in arithmetic operations. Computations may split a 64-bit double into two parts and/or use
subtraction of intermediate terms to compute round-off parts. These operations may generate
infinite values due to overflow which then propagate through further operations to NaN,
for example computing the round-off using Inf - Inf = NaN.
Operations that involve splitting a double (multiply, divide) are safe when the base 2 exponent is below 996. This puts an upper limit of approximately +/-6.7e299 on any values to be split; in practice the arguments to multiply and divide operations are further constrained by the expected finite value of the product or quotient.
Likewise the smallest value that can be represented is Double.MIN_VALUE. The full
106-bit accuracy will be lost when intermediates are within 253 of
Double.MIN_NORMAL.
The DD result can be verified by checking it is a finite
evaluated sum. Computations expecting to approach over or underflow must use scaling of
intermediate terms (see frexp and scalb) and
appropriate management of the current base 2 scale.
References:
- Dekker, T.J. (1971) A floating-point technique for extending the available precision Numerische Mathematik, 18:224–242.
- Shewchuk, J.R. (1997) Arbitrary Precision Floating-Point Arithmetic.
- Hide, Y, Li, X.S. and Bailey, D.H. (2008) Library for Double-Double and Quad-Double Arithmetic.
- Since:
- 1.2
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final intThe value 1022 converted for use if usingInteger.compareUnsigned(int, int).private static final intThe value 2046 converted for use if usingInteger.compareUnsigned(int, int).private static final intThe value -1 converted for use if usingInteger.compareUnsigned(int, int).private static final intThe mask to extract the raw 11-bit exponent.private static final intExponent offset in IEEE754 representation.private static final charprivate static final charprivate static final charprivate static final double0.5.private static final longMask to extract the high 32-bits from a long.private static final longMask to extract the 52-bit mantissa from a long representation of a double.private static final doubleThe multiplier used to split the double value into high and low parts.static final DDA double-double number representing one.private static final doubleThe limit for safe multiplication ofx*y, assuming values above 1.private static final longSerializable version identifier.private static final intThe size of the buffer fortoString().private static final double2^512.private static final double2^53.private static final double2^-512.private static final longMask to remove the sign bit from a long.private final doubleThe high part of the double-double number.private final doubleThe low part of the double-double number.static final DDA double-double number representing zero. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateDD(double x, double xx) Create a double-double number(x, xx). -
Method Summary
Modifier and TypeMethodDescriptionabs()Returns aDDwhose value is the absolute value of the number(x, xx)This method assumes that the low partxxis the smaller magnitude.(package private) static DDaccurateAdd(double x, double xx, double y) Compute the sum of(x, xx)andy.(package private) static DDaccurateAdd(double x, double xx, double y, double yy) Compute the sum of(x, xx)and(y, yy).add(double y) Returns aDDwhose value is(this + y).(package private) static DDadd(double x, double xx, double y, double yy) Compute the sum of(x, xx)and(y, yy).Returns aDDwhose value is(this + y).Get the value as aBigDecimal.ceil()Returns the smallest (closest to negative infinity)DDvalue that is greater than or equal tothisnumber(x, xx)and is equal to a mathematical integer.private static DDcomputePow(double x, double xx, int n) Compute the numberx(non-zero finite) raised to the powern.private static DDcomputePowScaled(long b, double x, double xx, int n, long[] exp) Compute the numberx(non-zero finite) raised to the powern.divide(double y) Returns aDDwhose value is(this / y).private static DDdivide(double x, double xx, double y) Compute the division of(x, xx)byy.private static DDdivide(double x, double xx, double y, double yy) Compute the division of(x, xx)by(y, yy).Returns aDDwhose value is(this / y).doubleGet the value as adouble.private static booleanequals(double x, double y) Returnstrueif the values are numerically equal.booleanTest for equality with another object.(package private) static DDfastTwoDiff(double a, double b) Compute the difference of two numbersaandbusing Dekker's two-sum algorithm.private static doublefastTwoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersaandbusing Dekker's two-sum algorithm.(package private) static DDfastTwoSum(double a, double b) Compute the sum of two numbersaandbusing Dekker's two-sum algorithm.(package private) static doublefastTwoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersaandbusing Dekker's two-sum algorithm.floatGet the value as afloat.floor()Returns the largest (closest to positive infinity)DDvalue that is less than or equal tothisnumber(x, xx)and is equal to a mathematical integer.private static DDfloorOrCeil(double x, double xx, DoubleUnaryOperator op) Implementation of the floor and ceiling functions.frexp(int[] exp) Convertthisnumberxto fractionalfand integral2^expcomponents.static DDfrom(BigDecimal x) Creates the double-double number(z, zz)using thedoublerepresentation of the argumentx; the low part is thedoublerepresentation of the round-off error.static DDfromQuotient(double x, double y) Returns aDDwhose value is(x / y).private static intgetScale(double a) Returns a scale suitable for use withMath.scalb(double, int)to normalise the number to the interval[1, 2).inthashCode()Gets a hash code for the double-double number.doublehi()Gets the first partxof the double-double number(x, xx).(package private) static doublehighPart(double value) Implement Dekker's method to split a value into two parts.intintValue()Get the value as anint.booleanisFinite()Returnstrueif the evaluated sum of the parts is finite.(package private) static booleanisNotNormal(double a) Checks if the number is not normal.booleanisOne()Check if this is a neutral element of multiplication, i.e.booleanisZero()Check if this is a neutral element of addition, i.e.doublelo()Gets the second partxxof the double-double number(x, xx).longGet the value as along.multiply(double y) Returns aDDwhose value isthis * y.private static DDmultiply(double x, double xx, double y) Compute the multiplication product of(x, xx)andy.private static DDmultiply(double x, double xx, double y, double yy) Compute the multiplication product of(x, xx)and(y, yy).multiply(int n) Repeated addition.Returns aDDwhose value isthis * y.negate()Returns aDDwhose value is the negation of both parts of double-double number.static DDof(double x) Creates the double-double number as the value(x, 0).(package private) static DDof(double x, double xx) Creates the double-double number as the value(x, xx).static DDof(int x) Creates the double-double number as the value(x, 0).static DDof(long x) Creates the double-double number with the high part equal to(double) xand the low part equal to any remaining bits.static DDofDifference(double x, double y) Returns aDDwhose value is(x - y).static DDofProduct(double x, double y) Returns aDDwhose value is(x * y).static DDofSquare(double x) Returns aDDwhose value is(x * x).static DDofSum(double x, double y) Returns aDDwhose value is(x + y).one()Identity element.pow(int n) Computethisnumber(x, xx)raised to the powern.pow(int n, long[] exp) Computethisnumberxraised to the powern.Compute the reciprocal ofthis.private static DDreciprocal(double y, double yy) Compute the inverse of(y, yy).scalb(int exp) Multiplythisnumber(x, xx)by an integral power of two.sqrt()Compute the square root ofthisnumber(x, xx).square()Returns aDDwhose value isthis * this.private static DDsquare(double x, double xx) Compute the square of(x, xx).subtract(double y) Returns aDDwhose value is(this - y).Returns aDDwhose value is(this - y).toString()Returns a string representation of the double-double number.(package private) static DDtwoDiff(double a, double b) Compute the difference of two numbersaandbusing Knuth's two-sum algorithm.private static doubletwoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersaandbusing Knuth two-sum algorithm.(package private) static doubletwoPow(int n) Create a normalized double with the value2^n.(package private) static DDtwoProd(double x, double y) Compute the double-double number(z,zz)for the exact product ofxandy.(package private) static doubletwoProductLow(double x, double y, double xy) Compute the low part of the double length number(z,zz)for the exact product ofxandyusing Dekker's mult12 algorithm.(package private) static doubletwoProductLow(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)for the exact product ofxandyusing Dekker's mult12 algorithm.(package private) static DDtwoSquare(double x) Compute the double-double number(z,zz)for the exact square ofx.(package private) static doubletwoSquareLow(double x, double x2) Compute the low part of the double length number(z,zz)for the exact square ofxusing Dekker's mult12 algorithm.(package private) static doubletwoSquareLow(double hx, double lx, double x2) Compute the low part of the double length number(z,zz)for the exact square ofxusing Dekker's mult12 algorithm.(package private) static DDtwoSum(double a, double b) Compute the sum of two numbersaandbusing Knuth's two-sum algorithm.(package private) static doubletwoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersaandbusing Knuth two-sum algorithm.zero()Identity element.Methods inherited from class java.lang.Number
byteValue, shortValue
-
Field Details
-
ONE
A double-double number representing one. -
ZERO
A double-double number representing zero. -
MULTIPLIER
private static final double MULTIPLIERThe multiplier used to split the double value into high and low parts. From Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, where p is the number of binary digits in the mantissa". Here p is 53 and the multiplier is2^27 + 1.- See Also:
-
EXP_MASK
private static final int EXP_MASKThe mask to extract the raw 11-bit exponent. The value must be shifted 52-bits to remove the mantissa bits.- See Also:
-
CMP_UNSIGNED_2046
private static final int CMP_UNSIGNED_2046The value 2046 converted for use if usingInteger.compareUnsigned(int, int). This requires addingInteger.MIN_VALUEto 2046.- See Also:
-
CMP_UNSIGNED_MINUS_1
private static final int CMP_UNSIGNED_MINUS_1The value -1 converted for use if usingInteger.compareUnsigned(int, int). This requires addingInteger.MIN_VALUEto -1.- See Also:
-
CMP_UNSIGNED_1022
private static final int CMP_UNSIGNED_1022The value 1022 converted for use if usingInteger.compareUnsigned(int, int). This requires addingInteger.MIN_VALUEto 1022.- See Also:
-
TWO_POW_512
private static final double TWO_POW_5122^512.- See Also:
-
TWO_POW_M512
private static final double TWO_POW_M5122^-512.- See Also:
-
TWO_POW_53
private static final double TWO_POW_532^53. Any double with a magnitude above this is an even integer.- See Also:
-
HIGH32_MASK
private static final long HIGH32_MASKMask to extract the high 32-bits from a long.- See Also:
-
UNSIGN_MASK
private static final long UNSIGN_MASKMask to remove the sign bit from a long.- See Also:
-
MANTISSA_MASK
private static final long MANTISSA_MASKMask to extract the 52-bit mantissa from a long representation of a double.- See Also:
-
EXPONENT_OFFSET
private static final int EXPONENT_OFFSETExponent offset in IEEE754 representation.- See Also:
-
HALF
private static final double HALF0.5.- See Also:
-
SAFE_MULTIPLY
private static final double SAFE_MULTIPLYThe limit for safe multiplication ofx*y, assuming values above 1. Used to maintain positive values during the power computation.- See Also:
-
TO_STRING_SIZE
private static final int TO_STRING_SIZEThe size of the buffer fortoString().The longest double will require a sign, a maximum of 17 digits, the decimal place and the exponent, e.g. for max value this is 24 chars: -1.7976931348623157e+308. Set the buffer size to twice this and round up to a power of 2 thus allowing for formatting characters. The size is 64.
- See Also:
-
FORMAT_START
private static final char FORMAT_START- See Also:
-
FORMAT_END
private static final char FORMAT_END- See Also:
-
FORMAT_SEP
private static final char FORMAT_SEP- See Also:
-
serialVersionUID
private static final long serialVersionUIDSerializable version identifier.- See Also:
-
x
private final double xThe high part of the double-double number. -
xx
private final double xxThe low part of the double-double number.
-
-
Constructor Details
-
DD
private DD(double x, double xx) Create a double-double number(x, xx).- Parameters:
x- High part.xx- Low part.
-
-
Method Details
-
of
Creates the double-double number as the value(x, 0).- Parameters:
x- Value.- Returns:
- the double-double
-
of
Creates the double-double number as the value(x, xx).Warning
The arguments are used directly. No checks are made that they represent a normalized double-double number:
x == x + xx.This method is exposed for testing.
- Parameters:
x- High part.xx- Low part.- Returns:
- the double-double
- See Also:
-
of
Creates the double-double number as the value(x, 0).Note this method exists to avoid using
of(long)forintegerarguments; thelongvariation is slower as it preserves all 64-bits of information.- Parameters:
x- Value.- Returns:
- the double-double
- See Also:
-
of
Creates the double-double number with the high part equal to(double) xand the low part equal to any remaining bits.Note this method preserves all 64-bits of precision. Faster construction can be achieved using up to 53-bits of precision using
of((double) x).- Parameters:
x- Value.- Returns:
- the double-double
- See Also:
-
from
Creates the double-double number(z, zz)using thedoublerepresentation of the argumentx; the low part is thedoublerepresentation of the round-off error.double z = x.doubleValue(); double zz = x.subtract(new BigDecimal(z)).doubleValue();
If the value cannot be represented as a finite value the result will have an infinite high part and the low part is undefined.
Note: This conversion can lose information about the precision of the BigDecimal value. The result is the closest double-double representation to the value.
- Parameters:
x- Value.- Returns:
- the double-double
-
ofSum
Returns aDDwhose value is(x + y). The values are not required to be ordered by magnitude, i.e. the result is commutative:x + y == y + x.This method ignores special handling of non-normal numbers and overflow within the extended precision computation. This creates the following special cases:
- If
x + yis infinite then the low part is NaN. - If
xoryis infinite or NaN then the low part is NaN. - If
x + yis sub-normal or zero then the low part is +/-0.0.
An invalid result can be identified using
isFinite().The result is the exact double-double representation of the sum.
- Parameters:
x- Addend.y- Addend.- Returns:
- the sum
x + y. - See Also:
- If
-
ofDifference
Returns aDDwhose value is(x - y). The values are not required to be ordered by magnitude, i.e. the result matches a negation and addition:x - y == -y + x.Computes the same results as
ofSum(a, -b). See that method for details of special cases.An invalid result can be identified using
isFinite().The result is the exact double-double representation of the difference.
- Parameters:
x- Minuend.y- Subtrahend.- Returns:
x - y.- See Also:
-
ofProduct
Returns aDDwhose value is(x * y).This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If either
|x|or|y|multiplied by1 + 2^27is infinite (intermediate overflow) then the low part is NaN. - If
x * yis infinite then the low part is NaN. - If
xoryis infinite or NaN then the low part is NaN. - If
x * yis sub-normal or zero then the low part is +/-0.0.
An invalid result can be identified using
isFinite().Note: Ignoring special cases is a design choice for performance. The method is therefore not a drop-in replacement for
roundOff = Math.fma(x, y, -x * y).The result is the exact double-double representation of the product.
- Parameters:
x- Factor.y- Factor.- Returns:
- the product
x * y.
- If either
-
ofSquare
Returns aDDwhose value is(x * x).This method is an optimisation of
multiply(x, x). See that method for details of special cases.An invalid result can be identified using
isFinite().The result is the exact double-double representation of the square.
- Parameters:
x- Factor.- Returns:
- the square
x * x. - See Also:
-
fromQuotient
Returns aDDwhose value is(x / y). Ify = 0the result is undefined.This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If either
|x / y|or|y|multiplied by1 + 2^27is infinite (intermediate overflow) then the low part is NaN. - If
x / yis infinite then the low part is NaN. - If
xoryis infinite or NaN then the low part is NaN. - If
x / yis sub-normal or zero, excluding the previous cases, then the low part is +/-0.0.
An invalid result can be identified using
isFinite().The result is the closest double-double representation to the quotient.
- Parameters:
x- Dividend.y- Divisor.- Returns:
- the quotient
x / y.
- If either
-
hi
public double hi()Gets the first partxof the double-double number(x, xx). In a normalized double-double number this part will have the greatest magnitude.This is equivalent to returning the high-part
xhi for the number(xhi, xlo).- Returns:
- the first part
-
lo
public double lo()Gets the second partxxof the double-double number(x, xx). In a normalized double-double number this part will have the smallest magnitude.This is equivalent to returning the low part
xlo for the number(xhi, xlo).- Returns:
- the second part
-
isFinite
public boolean isFinite()Returnstrueif the evaluated sum of the parts is finite.This method is provided as a utility to check the result of a
DDcomputation. Note that for performance theDDclass does not follow IEEE754 arithmetic for infinite and NaN, and does not protect from overflow of intermediate values in multiply and divide operations. If this method returnsfalsefollowingDDarithmetic then the computation is not supported to extended precision.Note: Any number that returns
truemay be converted to the exactBigDecimalvalue.- Returns:
trueif this instance represents a finitedoublevalue.- See Also:
-
doubleValue
public double doubleValue()Get the value as adouble. This is the evaluated sum of the parts.Note that even when the return value is finite, this conversion can lose information about the precision of the
DDvalue.Conversion of a finite
DDcan also be performed using theBigDecimalrepresentation.- Specified by:
doubleValuein classNumber- Returns:
- the value converted to a
double - See Also:
-
floatValue
public float floatValue()Get the value as afloat. This is the narrowing primitive conversion of thedoubleValue(). This conversion can lose range, resulting in afloatzero from a nonzerodoubleand afloatinfinity from a finitedouble. AdoubleNaN is converted to afloatNaN and adoubleinfinity is converted to the same-signedfloatinfinity.Note that even when the return value is finite, this conversion can lose information about the precision of the
DDvalue.Conversion of a finite
DDcan also be performed using theBigDecimalrepresentation.- Specified by:
floatValuein classNumber- Returns:
- the value converted to a
float - See Also:
-
intValue
public int intValue()Get the value as anint. This conversion discards the fractional part of the number and effectively rounds the value to the closest whole number in the direction of zero. This is the equivalent of a cast of a floating-point number to an integer, for example(int) -2.75 => -2.Note that this conversion can lose information about the precision of the
DDvalue.Special cases:
- If the
DDvalue is infinite the result isInteger.MAX_VALUE. - If the
DDvalue is -infinite the result isInteger.MIN_VALUE. - If the
DDvalue is NaN the result is 0.
Conversion of a finite
DDcan also be performed using theBigDecimalrepresentation. Note thatBigDecimalconversion rounds to theBigIntegerwhole number representation and returns the low-order 32-bits. Numbers too large for anintmay change sign. This method ensures the sign is correct by directly rounding to anintand returning the respective upper or lower limit for numbers too large for anint. - If the
-
longValue
public long longValue()Get the value as along. This conversion discards the fractional part of the number and effectively rounds the value to the closest whole number in the direction of zero. This is the equivalent of a cast of a floating-point number to an integer, for example(long) -2.75 => -2.Note that this conversion can lose information about the precision of the
DDvalue.Special cases:
- If the
DDvalue is infinite the result isLong.MAX_VALUE. - If the
DDvalue is -infinite the result isLong.MIN_VALUE. - If the
DDvalue is NaN the result is 0.
Conversion of a finite
DDcan also be performed using theBigDecimalrepresentation. Note thatBigDecimalconversion rounds to theBigIntegerwhole number representation and returns the low-order 64-bits. Numbers too large for alongmay change sign. This method ensures the sign is correct by directly rounding to alongand returning the respective upper or lower limit for numbers too large for along. - If the
-
bigDecimalValue
Get the value as aBigDecimal. This is the evaluated sum of the parts; the conversion is exact.The conversion will raise a
NumberFormatExceptionif the number is non-finite.- Returns:
- the double-double as a
BigDecimal. - Throws:
NumberFormatException- if any part of the number isinfiniteorNaN- See Also:
-
fastTwoSum
Compute the sum of two numbersaandbusing Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|.If
ais zero andbis non-zero the returned value is(b, 0).- Parameters:
a- First part of sum.b- Second part of sum.- Returns:
- the sum
- See Also:
-
fastTwoSumLow
static double fastTwoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersaandbusing Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|.If
ais zero andbis non-zero the returned value is zero.- Parameters:
a- First part of sum.b- Second part of sum.x- Sum.- Returns:
- the sum round-off
- See Also:
-
fastTwoDiff
Compute the difference of two numbersaandbusing Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|.Computes the same results as
fastTwoSum(a, -b).- Parameters:
a- Minuend.b- Subtrahend.- Returns:
- the difference
- See Also:
-
fastTwoDiffLow
private static double fastTwoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersaandbusing Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|.- Parameters:
a- Minuend.b- Subtrahend.x- Difference.- Returns:
- the difference round-off
- See Also:
-
twoSum
Compute the sum of two numbersaandbusing Knuth's two-sum algorithm. The values are not required to be ordered by magnitude, i.e. the result is commutatives = a + b == b + a.- Parameters:
a- First part of sum.b- Second part of sum.- Returns:
- the sum
- See Also:
-
twoSumLow
static double twoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersaandbusing Knuth two-sum algorithm. The values are not required to be ordered by magnitude, i.e. the result is commutatives = a + b == b + a.- Parameters:
a- First part of sum.b- Second part of sum.x- Sum.- Returns:
- the sum round-off
- See Also:
-
twoDiff
Compute the difference of two numbersaandbusing Knuth's two-sum algorithm. The values are not required to be ordered by magnitude.Computes the same results as
twoSum(a, -b).- Parameters:
a- Minuend.b- Subtrahend.- Returns:
- the difference
- See Also:
-
twoDiffLow
private static double twoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersaandbusing Knuth two-sum algorithm. The values are not required to be ordered by magnitude,- Parameters:
a- Minuend.b- Subtrahend.x- Difference.- Returns:
- the difference round-off
- See Also:
-
twoProd
Compute the double-double number(z,zz)for the exact product ofxandy.The high part of the number is equal to the product
z = x * y. The low part is set to the round-off of thedoubleproduct.This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If
x * yis sub-normal or zero then the low part is +/-0.0. - If
x * yis infinite then the low part is NaN. - If
xoryis infinite or NaN then the low part is NaN. - If either
|x|or|y|multiplied by1 + 2^27is infinite (intermediate overflow) then the low part is NaN.
Note: Ignoring special cases is a design choice for performance. The method is therefore not a drop-in replacement for
round_off = Math.fma(x, y, -x * y).- Parameters:
x- First factor.y- Second factor.- Returns:
- the product
- If
-
twoProductLow
static double twoProductLow(double x, double y, double xy) Compute the low part of the double length number(z,zz)for the exact product ofxandyusing Dekker's mult12 algorithm. The standard precision productx*ymust be provided. The numbersxandyare split into high and low parts using Dekker's algorithm.Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.
- Parameters:
x- First factor.y- Second factor.xy- Product of the factors (x * y).- Returns:
- the low part of the product double length number
- See Also:
-
twoProductLow
static double twoProductLow(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)for the exact product ofxandyusing Dekker's mult12 algorithm. The standard precision productx*y, and the high and low parts of the factors must be provided.- Parameters:
hx- High-part of first factor.lx- Low-part of first factor.hy- High-part of second factor.ly- Low-part of second factor.xy- Product of the factors (x * y).- Returns:
- the low part of the product double length number
-
twoSquare
Compute the double-double number(z,zz)for the exact square ofx.The high part of the number is equal to the square
z = x * x. The low part is set to the round-off of thedoublesquare.This method is an optimisation of
twoProd(x, x). See that method for details of special cases.- Parameters:
x- Factor.- Returns:
- the square
- See Also:
-
twoSquareLow
static double twoSquareLow(double x, double x2) Compute the low part of the double length number(z,zz)for the exact square ofxusing Dekker's mult12 algorithm. The standard precision squarex*xmust be provided. The numberxis split into high and low parts using Dekker's algorithm.Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.
- Parameters:
x- Factor.x2- Square of the factor (x * x).- Returns:
- the low part of the square double length number
- See Also:
-
twoSquareLow
static double twoSquareLow(double hx, double lx, double x2) Compute the low part of the double length number(z,zz)for the exact square ofxusing Dekker's mult12 algorithm. The standard precision squarex*x, and the high and low parts of the factors must be provided.- Parameters:
hx- High-part of factor.lx- Low-part of factor.x2- Square of the factor (x * x).- Returns:
- the low part of the square double length number
-
highPart
static double highPart(double value) Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates a big value from which to derive the two split parts.c = (2^s + 1) * a a_big = c - a a_hi = c - a_big a_lo = a - a_hi a = a_hi + a_lo
The multiplicand allows a p-bit value to be split into (p-s)-bit value
a_hiand a non-overlapping (s-1)-bit valuea_lo. Combined they have (p-1) bits of significand but the sign bit ofa_locontains a bit of information. The constant is chosen so that s is ceil(p/2) where the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1 for a non sub-normal number) and s is 27.This conversion does not use scaling and the result of overflow is NaN. Overflow may occur when the exponent of the input value is above 996.
Splitting a NaN or infinite value will return NaN.
- Parameters:
value- Value.- Returns:
- the high part of the value.
- See Also:
-
negate
Returns aDDwhose value is the negation of both parts of double-double number. -
abs
Returns aDDwhose value is the absolute value of the number(x, xx)This method assumes that the low partxxis the smaller magnitude.Cases:
- If the
xvalue is negative the result is(-x, -xx). - If the
xvalue is +/- 0.0 the result is(0.0, 0.0); this will remove sign information from the round-off component assumed to be zero. - Otherwise the result is
this.
- Returns:
- the absolute value
- See Also:
- If the
-
floor
Returns the largest (closest to positive infinity)DDvalue that is less than or equal tothisnumber(x, xx)and is equal to a mathematical integer.This method may change the representation of zero and non-finite values; the result is equivalent to
Math.floor(x)and thexxpart is ignored.Cases:
- If
xis NaN, then the result is(NaN, 0). - If
xis infinite, then the result is(x, 0). - If
xis +/-0.0, then the result is(x, 0). - If
x != Math.floor(x), then the result is(Math.floor(x), 0). - Otherwise the result is the
DDvalue equal to the sumMath.floor(x) + Math.floor(xx).
The result may generate a high part smaller (closer to negative infinity) than
Math.floor(x)ifxis a representable integer and thexxvalue is negative.- Returns:
- the largest (closest to positive infinity) value that is less than or equal
to
thisand is equal to a mathematical integer - See Also:
- If
-
ceil
Returns the smallest (closest to negative infinity)DDvalue that is greater than or equal tothisnumber(x, xx)and is equal to a mathematical integer.This method may change the representation of zero and non-finite values; the result is equivalent to
Math.ceil(x)and thexxpart is ignored.Cases:
- If
xis NaN, then the result is(NaN, 0). - If
xis infinite, then the result is(x, 0). - If
xis +/-0.0, then the result is(x, 0). - If
x != Math.ceil(x), then the result is(Math.ceil(x), 0). - Otherwise the result is the
DDvalue equal to the sumMath.ceil(x) + Math.ceil(xx).
The result may generate a high part larger (closer to positive infinity) than
Math.ceil(x)ifxis a representable integer and thexxvalue is positive.- Returns:
- the smallest (closest to negative infinity) value that is greater than or equal
to
thisand is equal to a mathematical integer - See Also:
- If
-
floorOrCeil
Implementation of the floor and ceiling functions.Cases:
- If
xis non-finite or zero, then the result is(x, 0). - If
xis rounded by the operator to a new valuey, then the result is(y, 0). - Otherwise the result is the
DDvalue equal to the sumop(x) + op(xx).
- Parameters:
x- High part of x.xx- Low part of x.op- Floor or ceiling operator.- Returns:
- the result
- If
-
add
Returns aDDwhose value is(this + y).This computes the same result as
add(DD.of(y)).The computed result is within 2 eps of the exact result where eps is 2-106.
- Parameters:
y- Value to be added to this number.- Returns:
this + y.- See Also:
-
add
Returns aDDwhose value is(this + y).The computed result is within 4 eps of the exact result where eps is 2-106.
-
add
Compute the sum of(x, xx)and(y, yy).The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.yy- Low part of y.- Returns:
- the sum
- See Also:
-
accurateAdd
Compute the sum of(x, xx)andy.This computes the same result as
accurateAdd(x, xx, y, 0).Note: This is an internal helper method used when accuracy is required. The computed result is within 1 eps of the exact result where eps is 2-106. The performance is approximately 1.5-fold slower than
add(double).- Parameters:
x- High part of x.xx- Low part of x.y- y.- Returns:
- the sum
-
accurateAdd
Compute the sum of(x, xx)and(y, yy).The high-part of the result is within 1 ulp of the true sum
e. The low-part of the result is within 1 ulp of the result of the high-part subtracted from the true sume - hi.Note: This is an internal helper method used when accuracy is required. The computed result is within 1 eps of the exact result where eps is 2-106. The performance is approximately 2-fold slower than
add(DD).- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.yy- Low part of y.- Returns:
- the sum
-
subtract
Returns aDDwhose value is(this - y).This computes the same result as
add(-y).The computed result is within 2 eps of the exact result where eps is 2-106.
- Parameters:
y- Value to be subtracted from this number.- Returns:
this - y.- See Also:
-
subtract
Returns aDDwhose value is(this - y).This computes the same result as
add(y.negate()).The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
subtractin interfaceNativeOperators<DD>- Parameters:
y- Value to be subtracted from this number.- Returns:
this - y.
-
multiply
Returns aDDwhose value isthis * y.This computes the same result as
multiply(DD.of(y)).The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
y- Factor.- Returns:
this * y.- See Also:
-
multiply
Compute the multiplication product of(x, xx)andy.This computes the same result as
multiply(x, xx, y, 0).The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.- Returns:
- the product
- See Also:
-
multiply
Returns aDDwhose value isthis * y.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
multiplyin interfaceMultiplication<DD>- Parameters:
y- Factor.- Returns:
this * y.
-
multiply
Compute the multiplication product of(x, xx)and(y, yy).The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.yy- Low part of y.- Returns:
- the product
-
square
Returns aDDwhose value isthis * this.This method is an optimisation of
multiply(this).The computed result is within 4 eps of the exact result where eps is 2-106.
- Returns:
this2- See Also:
-
square
Compute the square of(x, xx).- Parameters:
x- High part of x.xx- Low part of x.- Returns:
- the square
-
divide
Returns aDDwhose value is(this / y). Ify = 0the result is undefined.The computed result is within 1 eps of the exact result where eps is 2-106.
- Parameters:
y- Divisor.- Returns:
this / y.
-
divide
Compute the division of(x, xx)byy. Ify = 0the result is undefined.The computed result is within 1 eps of the exact result where eps is 2-106.
- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.- Returns:
- the quotient
-
divide
Returns aDDwhose value is(this / y). Ify = 0the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
dividein interfaceNativeOperators<DD>- Parameters:
y- Divisor.- Returns:
this / y.
-
divide
Compute the division of(x, xx)by(y, yy). Ify = 0the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x- High part of x.xx- Low part of x.y- High part of y.yy- Low part of y.- Returns:
- the quotient
-
reciprocal
Compute the reciprocal ofthis. Ifthisvalue is zero the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
reciprocalin interfaceMultiplication<DD>- Returns:
this-1
-
reciprocal
Compute the inverse of(y, yy). Ify = 0the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
y- High part of y.yy- Low part of y.- Returns:
- the inverse
-
sqrt
Compute the square root ofthisnumber(x, xx).Uses the result
Math.sqrt(x)if that result is not a finite normalizeddouble.Special cases:
- If
xis NaN or less than zero, then the result is(NaN, 0). - If
xis positive infinity, then the result is(+infinity, 0). - If
xis positive zero or negative zero, then the result is(x, 0).
The computed result is within 4 eps of the exact result where eps is 2-106.
- Returns:
sqrt(this)- See Also:
- If
-
isNotNormal
static boolean isNotNormal(double a) Checks if the number is not normal. This is functionally equivalent to:final double abs = Math.abs(a); return (abs <= Double.MIN_NORMAL || !(abs <= Double.MAX_VALUE));- Parameters:
a- The value.- Returns:
- true if the value is not normal
-
scalb
Multiplythisnumber(x, xx)by an integral power of two.(y, yy) = (x, xx) * 2^exp
The result is rounded as if performed by a single correctly rounded floating-point multiply. This performs the same result as:
y = Math.scalb(x, exp); yy = Math.scalb(xx, exp);
The implementation computes using a single multiplication if
expis in[-1022, 1023]. Otherwise the parts(x, xx)are scaled by repeated multiplication by power-of-two factors. The result is exact unless the scaling generates sub-normal parts; in this case precision may be lost by a single rounding.- Parameters:
exp- Power of two scale factor.- Returns:
- the result
- See Also:
-
twoPow
static double twoPow(int n) Create a normalized double with the value2^n.Warning: Do not call with
n = -1023. This will create zero.- Parameters:
n- Exponent (in the range [-1022, 1023]).- Returns:
- the double
-
frexp
Convertthisnumberxto fractionalfand integral2^expcomponents.x = f * 2^exp
The combined fractional part (f, ff) is in the range
[0.5, 1).Special cases:
- If
xis zero, then the normalized fraction is zero and the exponent is zero. - If
xis NaN, then the normalized fraction is NaN and the exponent is unspecified. - If
xis infinite, then the normalized fraction is infinite and the exponent is unspecified. - If high-part
xis an exact power of 2 and the low-partxxhas an opposite signed non-zero magnitude then fraction high-partfwill be+/-1such that the double-double number is in the range[0.5, 1).
This is named using the equivalent function in the standard C math.h library.
- Parameters:
exp- Power of two scale factor (integral exponent).- Returns:
- Fraction part.
- See Also:
- If
-
getScale
private static int getScale(double a) Returns a scale suitable for use withMath.scalb(double, int)to normalise the number to the interval[1, 2).In contrast to
Math.getExponent(double)this handles sub-normal numbers by computing the number of leading zeros in the mantissa and shifting the unbiased exponent. The result is that for all finite, non-zero, numbers, the magnitude ofscalb(x, -getScale(x))is always in the range[1, 2).This method is a functional equivalent of the c function ilogb(double).
The result is to be used to scale a number using
Math.scalb(double, int). Hence the special case of a zero argument is handled using the return value for NaN as zero cannot be scaled. This is different fromMath.getExponent(double).Special cases:
- If the argument is NaN or infinite, then the result is
Double.MAX_EXPONENT+ 1. - If the argument is zero, then the result is
Double.MAX_EXPONENT+ 1.
- Parameters:
a- Value.- Returns:
- The unbiased exponent of the value to be used for scaling, or 1024 for 0, NaN or Inf
- See Also:
- If the argument is NaN or infinite, then the result is
-
pow
Computethisnumber(x, xx)raised to the powern.Special cases:
- If
xis not a finite normalizeddouble, the low partxxis ignored and the result isMath.pow(x, n). - If
n = 0the result is(1, 0). - If
n = 1the result is(x, xx). - If
n = -1the result is thereciprocal. - If the computation overflows the result is undefined.
Computation uses multiplication by factors generated by repeat squaring of the value. These multiplications have no special case handling for overflow; in the event of overflow the result is undefined. The
pow(int, long[])method can be used to generate a scaled fraction result for any finiteDDnumber and exponent.The computed result is approximately
16 * (n - 1) * epsof the exact result where eps is 2-106.- Specified by:
powin interfaceNativeOperators<DD>- Parameters:
n- Exponent.- Returns:
thisn- See Also:
- If
-
computePow
Compute the numberx(non-zero finite) raised to the powern.The input power is treated as an unsigned integer. Thus the negative value
Integer.MIN_VALUEis 2^31.- Parameters:
x- Fractional high part of x.xx- Fractional low part of x.n- Power (in [2, 2^31]).- Returns:
- x^n.
-
pow
Computethisnumberxraised to the powern.The value is returned as fractional
fand integral2^expcomponents.(x+xx)^n = (f+ff) * 2^exp
The combined fractional part (f, ff) is in the range
[0.5, 1).Special cases:
- If
(x, xx)is zero the high part of the fractional part is computed usingMath.pow(x, n)and the exponent is 0. - If
n = 0the fractional part is 0.5 and the exponent is 1. - If
(x, xx)is an exact power of 2 the fractional part is 0.5 and the exponent is the power of 2 minus 1. - If the result high-part is an exact power of 2 and the low-part has an opposite
signed non-zero magnitude then the fraction high-part
fwill be+/-1such that the double-double number is in the range[0.5, 1). - If the argument is not finite then a fractional representation is not possible. In this case the fraction and the scale factor is undefined.
The computed result is approximately
16 * (n - 1) * epsof the exact result where eps is 2-106.- Parameters:
n- Power.exp- Result power of two scale factor (integral exponent).- Returns:
- Fraction part.
- See Also:
- If
-
computePowScaled
Compute the numberx(non-zero finite) raised to the powern.The input power is treated as an unsigned integer. Thus the negative value
Integer.MIN_VALUEis 2^31.- Parameters:
b- Integral component 2^exp of x.x- Fractional high part of x.xx- Fractional low part of x.n- Power (in [2, 2^31]).exp- Result power of two scale factor (integral exponent).- Returns:
- Fraction part.
-
equals
Test for equality with another object. If the other object is aDDthen a comparison is made of the parts; otherwisefalseis returned.If both parts of two double-double numbers are numerically equivalent the two
DDobjects are considered to be equal. For this purpose, twodoublevalues are considered to be the same if and only if the method callDouble.doubleToLongBits(value + 0.0)returns the identicallongwhen applied to each value. This provides numeric equality of different representations of zero as per-0.0 == 0.0, and equality ofNaNvalues.Note that in most cases, for two instances of class
DD,xandy, the value ofx.equals(y)istrueif and only ifx.hi() == y.hi() && x.lo() == y.lo()also has the value
true. However, there are exceptions:- Instances that contain
NaNvalues in the same part are considered to be equal for that part, even thoughDouble.NaN == Double.NaNhas the valuefalse. - Instances that share a
NaNvalue in one part but have different values in the other part are not considered equal.
The behavior is the same as if the components of the two double-double numbers were passed to
Arrays.equals(double[], double[]):Arrays.equals(new double[]{x.hi() + 0.0, x.lo() + 0.0}, new double[]{y.hi() + 0.0, y.lo() + 0.0});Note: Addition of
0.0converts signed representations of zero values-0.0and0.0to a canonical0.0. - Instances that contain
-
hashCode
public int hashCode()Gets a hash code for the double-double number.The behavior is the same as if the parts of the double-double number were passed to
Arrays.hashCode(double[]):Arrays.hashCode(new double[] {hi() + 0.0, lo() + 0.0})Note: Addition of
0.0provides the same hash code for different signed representations of zero values-0.0and0.0. -
equals
private static boolean equals(double x, double y) Returnstrueif the values are numerically equal.Two
doublevalues are considered to be the same if and only if the method callDouble.doubleToLongBits(value + 0.0)returns the identicallongwhen applied to each value. This provides numeric equality of different representations of zero as per-0.0 == 0.0, and equality ofNaNvalues.- Parameters:
x- Valuey- Value- Returns:
trueif the values are numerically equal
-
toString
Returns a string representation of the double-double number.The string will represent the numeric values of the parts. The values are split by a separator and surrounded by parentheses.
The format for a double-double number is
"(x,xx)", withxandxxconverted as if usingDouble.toString(double).Note: A numerical string representation of a finite double-double number can be generated by conversion to a
BigDecimalbefore formatting. -
zero
Identity element.Note: Addition of this value with any element
amay not create an element equal toaif the element contains sign zeros. In this case the magnitude of the result will be identical. -
isZero
public boolean isZero()Check if this is a neutral element of addition, i.e.this.add(a)returnsaor an element representing the same value asa.The default implementation calls
equals(zero()). Implementations may want to employ more a efficient method. This may even be required if an implementation has multiple representations ofzeroand itsequalsmethod differentiates between them. -
one
Identity element.Note: Multiplication of this value with any element
amay not create an element equal toaif the element contains sign zeros. In this case the magnitude of the result will be identical.- Specified by:
onein interfaceMultiplication<DD>- Returns:
- the field element such that for all
a,one().multiply(a).equals(a)istrue.
-
isOne
public boolean isOne()Check if this is a neutral element of multiplication, i.e.this.multiply(a)returnsaor an element representing the same value asa.The default implementation calls
equals(one()). Implementations may want to employ more a efficient method. This may even be required if an implementation has multiple representations ofoneand itsequalsmethod differentiates between them.- Specified by:
isOnein interfaceMultiplication<DD>- Returns:
trueifthisis a neutral element of multiplication.- See Also:
-
multiply
Repeated addition.This computes the same result as
multiply((double) y).- Specified by:
multiplyin interfaceNativeOperators<DD>- Parameters:
n- Number of times to addthisto itself.- Returns:
n * this.- See Also:
-