Package org.apfloat.aparapi
Class LongKernel
- java.lang.Object
-
- com.aparapi.Kernel
-
- org.apfloat.aparapi.LongKernel
-
- All Implemented Interfaces:
java.lang.Cloneable
class LongKernel extends com.aparapi.KernelKernel for thelongelement type. Contains everything needed for the NTT. The data is organized in columns, not rows, for efficient processing on the GPU. Due to the extreme parallelization requirements (global size should be at lest 1024) this algorithm works efficiently only with 8 million decimal digit calculations or bigger. However with 8 million digits, it's only approximately as fast as the pure-Java version (depending on the GPU and CPU hardware). Depending on the total amount of memory available for the GPU this algorithm will fail (or revert to the very slow software emulation) e.g. at one-billion-digit calculations if your GPU has 1 GB of memory. The maximum power-of-two size for a Java array is one billion (230) so if your GPU has more than 8 GB of memory then the algorithm can never fail (as any Java long[] will always fit to the GPU memory).Some notes about the aparapi specific requirements for code that must be converted to OpenCL:
assert()does not work- Can't check for null
- Can't get array length
- Arrays referenced by the kernel can't be null even if they are not accessed
- Arrays referenced by the kernel can't be zero-length even if they are not accessed
- Can't invoke methods in other classes e.g. enclosing class of an inner class
- Early return statements do not work
- Variables used inside loops must be initialized before the loop
- Must compile the class with full debug information i.e. with
-g
- Since:
- 1.8.3
- Version:
- 1.9.0
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.aparapi.Kernel
com.aparapi.Kernel.Constant, com.aparapi.Kernel.Entry, com.aparapi.Kernel.EXECUTION_MODE, com.aparapi.Kernel.KernelState, com.aparapi.Kernel.Local, com.aparapi.Kernel.NoCL, com.aparapi.Kernel.OpenCLDelegate, com.aparapi.Kernel.OpenCLMapping, com.aparapi.Kernel.PrivateMemorySpace
-
-
Field Summary
Fields Modifier and Type Field Description private intcolumnsprivate long[]dataprivate int[]indexprivate intindexCountstatic intINVERSE_TRANSFORM_COLUMNSstatic intINVERSE_TRANSFORM_ROWSprivate doubleinverseModulusprivate static java.lang.ThreadLocal<LongKernel>kernelprivate intlengthprivate longmodulusstatic intMULTIPLY_ELEMENTSprivate intn2private intoffsetprivate intopprivate int[]permutationTableprivate intpermutationTableLengthstatic intPERMUTEprivate introwsprivate longscaleFactorprivate intstartColumnprivate intstartRowprivate intstridestatic intTRANSFORM_COLUMNSstatic intTRANSFORM_ROWSstatic intTRANSPOSEprivate longwprivate longw1private longw2private long[]wTableprivate longww
-
Constructor Summary
Constructors Modifier Constructor Description privateLongKernel()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidcolumnScramble(int offset)private voidcolumnTableFNT()static LongKernelgetInstance()longgetModulus()private voidinverseColumnTableFNT()private longmodAdd(long a, long b)private longmodMultiply(long a, long b)private longmodPow(long a, long n)private longmodSubtract(long a, long b)private voidmultiplyElements()private voidpermute()voidrun()voidsetArrayAccess(ArrayAccess arrayAccess)voidsetColumns(int columns)voidsetIndex(int[] index)voidsetIndexCount(int indexCount)voidsetLength(int length)voidsetModulus(long modulus)voidsetN2(int n2)voidsetOp(int op)voidsetPermutationTable(int[] permutationTable)voidsetRows(int rows)voidsetScaleFactor(long scaleFactor)voidsetStartColumn(int startColumn)voidsetStartRow(int startRow)voidsetW(long w)voidsetW1(long w1)voidsetW2(long w2)voidsetWTable(long[] wTable)voidsetWw(long ww)private voidtransformColumns()private voidtranspose()-
Methods inherited from class com.aparapi.Kernel
abs, abs, abs, abs, acos, acos, acospi, acospi, addExecutionModes, asin, asin, asinpi, asinpi, atan, atan, atan2, atan2, atan2pi, atan2pi, atanpi, atanpi, atomicAdd, atomicAdd, atomicAnd, atomicCmpXchg, atomicDec, atomicGet, atomicInc, atomicMax, atomicMin, atomicOr, atomicSet, atomicSub, atomicXchg, atomicXor, cancelMultiPass, cbrt, cbrt, ceil, ceil, cleanUpArrays, clone, clz, clz, compile, compile, cos, cos, cosh, cosh, cospi, cospi, createRange, dispose, execute, execute, execute, execute, execute, execute, executeFallbackAlgorithm, exp, exp, exp10, exp10, exp2, exp2, expm1, expm1, floor, floor, fma, fma, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, getAccumulatedExecutionTime, getAccumulatedExecutionTimeAllThreads, getAccumulatedExecutionTimeCurrentThread, getCancelState, getConversionTime, getCurrentPass, getExecutionMode, getExecutionTime, getGlobalId, getGlobalId, getGlobalSize, getGlobalSize, getGroupId, getGroupId, getKernelCompileWorkGroupSize, getKernelLocalMemSizeInUse, getKernelMaxWorkGroupSize, getKernelMinimumPrivateMemSizeInUsePerWorkItem, getKernelPreferredWorkGroupSizeMultiple, getKernelState, getLocalId, getLocalId, getLocalSize, getLocalSize, getMappedMethodName, getNumGroups, getNumGroups, getPassId, getProfileInfo, getProfileReportCurrentThread, getProfileReportLastThread, getTargetDevice, globalBarrier, hasFallbackAlgorithm, hasNextExecutionMode, hypot, hypot, IEEEremainder, IEEEremainder, invalidateCaches, isAllowDevice, isAutoCleanUpArrays, isExecuting, isExplicit, isMappedMethod, isOpenCLDelegateMethod, isRunningCL, localBarrier, localGlobalBarrier, log, log, log10, log10, log1p, log1p, log2, log2, mad, mad, max, max, max, max, min, min, min, min, nextAfter, nextAfter, popcount, popcount, pow, pow, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, registerProfileReportObserver, rint, rint, round, round, rsqrt, rsqrt, setAutoCleanUpArrays, setExecutionMode, setExecutionModeWithoutFallback, setExplicit, setFallbackExecutionMode, sin, sin, sinh, sinh, sinpi, sinpi, sqrt, sqrt, tan, tan, tanh, tanh, tanpi, tanpi, toDegrees, toDegrees, toRadians, toRadians, toString, tryNextExecutionMode, usesAtomic32, usesAtomic64
-
-
-
-
Field Detail
-
kernel
private static java.lang.ThreadLocal<LongKernel> kernel
-
TRANSFORM_ROWS
public static final int TRANSFORM_ROWS
- See Also:
- Constant Field Values
-
INVERSE_TRANSFORM_ROWS
public static final int INVERSE_TRANSFORM_ROWS
- See Also:
- Constant Field Values
-
stride
private int stride
-
length
private int length
-
data
private long[] data
-
offset
private int offset
-
wTable
private long[] wTable
-
permutationTable
private int[] permutationTable
-
permutationTableLength
private int permutationTableLength
-
modulus
private long modulus
-
inverseModulus
private double inverseModulus
-
TRANSPOSE
public static final int TRANSPOSE
- See Also:
- Constant Field Values
-
PERMUTE
public static final int PERMUTE
- See Also:
- Constant Field Values
-
n2
private int n2
-
index
private int[] index
-
indexCount
private int indexCount
-
MULTIPLY_ELEMENTS
public static final int MULTIPLY_ELEMENTS
- See Also:
- Constant Field Values
-
startRow
private int startRow
-
startColumn
private int startColumn
-
rows
private int rows
-
columns
private int columns
-
w
private long w
-
scaleFactor
private long scaleFactor
-
TRANSFORM_COLUMNS
public static final int TRANSFORM_COLUMNS
- See Also:
- Constant Field Values
-
INVERSE_TRANSFORM_COLUMNS
public static final int INVERSE_TRANSFORM_COLUMNS
- See Also:
- Constant Field Values
-
op
private int op
-
ww
private long ww
-
w1
private long w1
-
w2
private long w2
-
-
Method Detail
-
getInstance
public static LongKernel getInstance()
-
setLength
public void setLength(int length)
-
setArrayAccess
public void setArrayAccess(ArrayAccess arrayAccess) throws ApfloatRuntimeException
- Throws:
ApfloatRuntimeException
-
setWTable
public void setWTable(long[] wTable)
-
setPermutationTable
public void setPermutationTable(int[] permutationTable)
-
columnTableFNT
private void columnTableFNT()
-
inverseColumnTableFNT
private void inverseColumnTableFNT()
-
columnScramble
private void columnScramble(int offset)
-
modMultiply
private long modMultiply(long a, long b)
-
modAdd
private long modAdd(long a, long b)
-
modSubtract
private long modSubtract(long a, long b)
-
setModulus
public void setModulus(long modulus)
-
getModulus
public long getModulus()
-
setN2
public void setN2(int n2)
-
setIndex
public void setIndex(int[] index)
-
setIndexCount
public void setIndexCount(int indexCount)
-
transpose
private void transpose()
-
permute
private void permute()
-
setStartRow
public void setStartRow(int startRow)
-
setStartColumn
public void setStartColumn(int startColumn)
-
setRows
public void setRows(int rows)
-
setColumns
public void setColumns(int columns)
-
setW
public void setW(long w)
-
setScaleFactor
public void setScaleFactor(long scaleFactor)
-
multiplyElements
private void multiplyElements()
-
modPow
private long modPow(long a, long n)
-
setOp
public void setOp(int op)
-
setWw
public void setWw(long ww)
-
setW1
public void setW1(long w1)
-
setW2
public void setW2(long w2)
-
run
public void run()
- Specified by:
runin classcom.aparapi.Kernel
-
transformColumns
private void transformColumns()
-
-