Class Kernel
- java.lang.Object
-
- com.aparapi.Kernel
-
- All Implemented Interfaces:
java.lang.Cloneable
public abstract class Kernel extends java.lang.Object implements java.lang.CloneableA kernel encapsulates a data parallel algorithm that will execute either on a GPU (through conversion to OpenCL) or on a CPU via a Java Thread Pool.To write a new kernel, a developer extends the
Kernelclass and overrides theKernel.run()method. To execute this kernel, the developer creates a new instance of it and callsKernel.execute(int globalSize)with a suitable 'global size'. At runtime Aparapi will attempt to convert theKernel.run()method (and any method called directly or indirectly byKernel.run()) into OpenCL for execution on GPU devices made available via the OpenCL platform.Note that
Kernel.run()is not called directly. Instead, theKernel.execute(int globalSize)method will cause the overriddenKernel.run()method to be invoked once for each value in the range0...globalSize.On the first call to
Kernel.execute(int _globalSize), Aparapi will determine the EXECUTION_MODE of the kernel. This decision is made dynamically based on two factors:- Whether OpenCL is available (appropriate drivers are installed and the OpenCL and Aparapi dynamic libraries are included on the system path).
- Whether the bytecode of the
run()method (and every method that can be called directly or indirectly from therun()method) can be converted into OpenCL.
Below is an example Kernel that calculates the square of a set of input values.
class SquareKernel extends Kernel{ private int values[]; private int squares[]; public SquareKernel(int values[]){ this.values = values; squares = new int[values.length]; } public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } public int[] getSquares(){ return(squares); } }To execute this kernel, first create a new instance of it and then call
execute(Range _range).int[] values = new int[1024]; // fill values array Range range = Range.create(values.length); // create a range 0..1024 SquareKernel kernel = new SquareKernel(values); kernel.execute(range);When
execute(Range)returns, all the executions ofKernel.run()have completed and the results are available in thesquaresarray.int[] squares = kernel.getSquares(); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]); }A different approach to creating kernels that avoids extending Kernel is to write an anonymous inner class:
final int[] values = new int[1024]; // fill the values array final int[] squares = new int[values.length]; final Range range = Range.create(values.length); Kernel kernel = new Kernel(){ public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } }; kernel.execute(range); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]); }- Version:
- Alpha, 21/09/2010
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceKernel.ConstantWe can use this Annotation to 'tag' intended constant buffers.classKernel.Entrystatic classKernel.EXECUTION_MODEDeprecated.It is no longer recommended thatEXECUTION_MODEs are used, as a more sophisticatedDevicepreference mechanism is in place, seeKernelManager.classKernel.KernelStateThis class is for internal Kernel state managementstatic interfaceKernel.LocalWe can use this Annotation to 'tag' intended local buffers.static interfaceKernel.NoCLAnnotation which can be applied to either a getter (with usual java bean naming convention relative to an instance field), or to any method with void return type, which prevents both the method body and any calls to the method being emitted in the generated OpenCL.protected static interfaceKernel.OpenCLDelegateThis annotation is for internal use onlyprotected static interfaceKernel.OpenCLMappingThis annotation is for internal use onlystatic interfaceKernel.PrivateMemorySpaceWe can use this Annotation to 'tag' __private (unshared) array fields.
-
Field Summary
Fields Modifier and Type Field Description private static java.util.function.IntBinaryOperatorandOperatorprivate static ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException>atomic32Cacheprivate static ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException>atomic64Cacheprivate booleanautoCleanUpArraysstatic java.lang.StringCONSTANT_SUFFIXWe can use this suffix to 'tag' intended constant buffers.private java.util.Iterator<Kernel.EXECUTION_MODE>currentModeDeprecated.private Kernel.EXECUTION_MODEexecutionModeDeprecated.private java.util.LinkedHashSet<Kernel.EXECUTION_MODE>executionModesDeprecated.private KernelRunnerkernelRunnerprivate Kernel.KernelStatekernelStatestatic java.lang.StringLOCAL_SUFFIXWe can use this suffix to 'tag' intended local buffers.private static doubleLOG_2_RECIPROCALprivate static java.util.logging.Loggerloggerprivate static ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException>mappedMethodFlagsprivate static ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.String>,java.lang.RuntimeException>mappedMethodNamesCacheprivate static java.util.function.IntBinaryOperatormaxOperatorprivate static java.util.function.IntBinaryOperatorminOperatorprivate static ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException>openCLDelegateMethodFlagsprivate static java.util.function.IntBinaryOperatororOperatorprivate static doublePI_RECIPROCALstatic java.lang.StringPRIVATE_SUFFIXWe can use this suffix to 'tag' __private buffers.(package private) static java.util.Map<java.lang.String,java.lang.String>typeToLetterMap(package private) booleanuseNullForLocalSizeprivate static java.util.function.IntBinaryOperatorxorOperator
-
Constructor Summary
Constructors Constructor Description Kernel()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected doubleabs(double _d)Delegates to eitherMath.abs(double)(Java) orfabs(double)(OpenCL).protected floatabs(float _f)Delegates to eitherMath.abs(float)(Java) orfabs(float)(OpenCL).protected intabs(int n)Delegates to eitherMath.abs(int)(Java) orabs(int)(OpenCL).protected longabs(long n)Delegates to eitherMath.abs(long)(Java) orabs(long)(OpenCL).protected doubleacos(double a)Delegates to eitherMath.acos(double)(Java) oracos(double)(OpenCL).protected floatacos(float a)Delegates to eitherMath.acos(double)(Java) oracos(float)(OpenCL).protected doubleacospi(double a)protected floatacospi(float a)voidaddExecutionModes(Kernel.EXECUTION_MODE... platforms)Deprecated.protected doubleasin(double _d)Delegates to eitherMath.asin(double)(Java) orasin(double)(OpenCL).protected floatasin(float _f)Delegates to eitherMath.asin(double)(Java) orasin(float)(OpenCL).protected doubleasinpi(double a)protected floatasinpi(float a)protected doubleatan(double _d)Delegates to eitherMath.atan(double)(Java) oratan(double)(OpenCL).protected floatatan(float _f)Delegates to eitherMath.atan(double)(Java) oratan(float)(OpenCL).protected doubleatan2(double _d1, double _d2)Delegates to eitherMath.atan2(double, double)(Java) oratan2(double, double)(OpenCL).protected floatatan2(float _f1, float _f2)Delegates to eitherMath.atan2(double, double)(Java) oratan2(float, float)(OpenCL).protected doubleatan2pi(double y, double x)protected floatatan2pi(float y, double x)protected doubleatanpi(double a)protected floatatanpi(float a)protected intatomicAdd(int[] _arr, int _index, int _delta)Atomically adds_deltavalue to_indexelement of array_arr(Java) or delegates toatomic_add(volatile int*, int)(OpenCL).protected intatomicAdd(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicAnd(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicCmpXchg(java.util.concurrent.atomic.AtomicInteger p, int expectedVal, int newVal)protected intatomicDec(java.util.concurrent.atomic.AtomicInteger p)protected intatomicGet(java.util.concurrent.atomic.AtomicInteger p)protected intatomicInc(java.util.concurrent.atomic.AtomicInteger p)protected intatomicMax(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicMin(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicOr(java.util.concurrent.atomic.AtomicInteger p, int val)protected voidatomicSet(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicSub(java.util.concurrent.atomic.AtomicInteger p, int val)protected intatomicXchg(java.util.concurrent.atomic.AtomicInteger p, int newVal)protected intatomicXor(java.util.concurrent.atomic.AtomicInteger p, int val)private static <K,V,T extends java.lang.Throwable>
ValueCache<java.lang.Class<?>,java.util.Map<K,V>,T>cacheProperty(ValueCache.ThrowingValueComputer<java.lang.Class<?>,java.util.Map<K,V>,T> throwingValueComputer)voidcancelMultiPass()Invoking this method flags that once the current pass is complete execution should be abandoned.protected doublecbrt(double a)protected floatcbrt(float a)protected doubleceil(double _d)Delegates to eitherMath.ceil(double)(Java) orceil(double)(OpenCL).protected floatceil(float _f)Delegates to eitherMath.ceil(double)(Java) orceil(float)(OpenCL).voidcleanUpArrays()Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitiveKernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range.Kernelclone()When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.protected intclz(int _i)Delegates to eitherInteger.numberOfLeadingZeros(int)(Java) orclz(int)(OpenCL).protected longclz(long _l)Delegates to eitherLong.numberOfLeadingZeros(long)(Java) orclz(long)(OpenCL).Kernelcompile(Device _device)Force pre-compilation of the kernel for a given device, without executing it.Kernelcompile(java.lang.String _entrypoint, Device _device)Force pre-compilation of the kernel for a given device, without executing it.protected doublecos(double _d)Delegates to eitherMath.cos(double)(Java) orcos(double)(OpenCL).protected floatcos(float _f)Delegates to eitherMath.cos(double)(Java) orcos(float)(OpenCL).protected doublecosh(double x)protected floatcosh(float x)protected doublecospi(double a)protected floatcospi(float a)protected RangecreateRange(int _range)private static java.lang.StringdescriptorToReturnTypeLetter(java.lang.String desc)voiddispose()Release any resources associated with this Kernel.Kernelexecute(int _range)Start execution of_rangekernels.Kernelexecute(int _range, int _passes)Start execution of_passesiterations over the_rangeof kernels.Kernelexecute(Range _range)Start execution of_rangekernels.Kernelexecute(Range _range, int _passes)Start execution of_passesiterations of_rangekernels.Kernelexecute(java.lang.String _entrypoint, Range _range)Start execution ofglobalSizekernels for the given entrypoint.Kernelexecute(java.lang.String _entrypoint, Range _range, int _passes)Start execution ofglobalSizekernels for the given entrypoint.voidexecuteFallbackAlgorithm(Range _range, int _passId)IfhasFallbackAlgorithm()has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.protected doubleexp(double _d)Delegates to eitherMath.exp(double)(Java) orexp(double)(OpenCL).protected floatexp(float _f)Delegates to eitherMath.exp(double)(Java) orexp(float)(OpenCL).protected doubleexp10(double a)protected floatexp10(float a)protected doubleexp2(double a)protected floatexp2(float a)protected doubleexpm1(double x)protected floatexpm1(float x)protected doublefloor(double _d)Delegates to eitherMath.floor(double)(Java) orfloor(double)(OpenCL).protected floatfloor(float _f)Delegates to eitherMath.floor(double)(Java) orfloor(float)(OpenCL).protected doublefma(double a, double b, double c)Delegates to either {code}a*b+c{code} (Java) orfma(double, double, double)(OpenCL).protected floatfma(float a, float b, float c)Delegates to either {code}a*b+c{code} (Java) orfma(float, float, float)(OpenCL).Kernelget(boolean[] array)Enqueue a request to return this buffer from the GPU.Kernelget(boolean[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(boolean[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(byte[] array)Enqueue a request to return this buffer from the GPU.Kernelget(byte[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(byte[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(char[] array)Enqueue a request to return this buffer from the GPU.Kernelget(char[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(char[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(double[] array)Enqueue a request to return this buffer from the GPU.Kernelget(double[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(double[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(float[] array)Enqueue a request to return this buffer from the GPU.Kernelget(float[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(float[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(int[] array)Enqueue a request to return this buffer from the GPU.Kernelget(int[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(int[][][] array)Enqueue a request to return this buffer from the GPU.Kernelget(long[] array)Enqueue a request to return this buffer from the GPU.Kernelget(long[][] array)Enqueue a request to return this buffer from the GPU.Kernelget(long[][][] array)Enqueue a request to return this buffer from the GPU.doublegetAccumulatedExecutionTime()Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.doublegetAccumulatedExecutionTimeAllThreads(Device device)Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.doublegetAccumulatedExecutionTimeCurrentThread(Device device)Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.private static java.lang.StringgetArgumentsLetters(java.lang.reflect.Method method)private static booleangetBoolean(ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> methodNamesCache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)intgetCancelState()doublegetConversionTime()Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.intgetCurrentPass()Kernel.EXECUTION_MODEgetExecutionMode()Deprecated.doublegetExecutionTime()Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.protected intgetGlobalId()Determine the globalId of an executing kernel.protected intgetGlobalId(int _dim)protected intgetGlobalSize()Determine the value that was passed toKernel.execute(int globalSize)method.protected intgetGlobalSize(int _dim)protected intgetGroupId()Determine the groupId of an executing kernel.protected intgetGroupId(int _dim)int[]getKernelCompileWorkGroupSize(Device device)Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.longgetKernelLocalMemSizeInUse(Device device)Retrieves the amount of local memory used in the specified device by this kernel instance.intgetKernelMaxWorkGroupSize(Device device)Retrieves the maximum work-group size allowed for this kernel when running on the specified device.longgetKernelMinimumPrivateMemSizeInUsePerWorkItem(Device device)Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.intgetKernelPreferredWorkGroupSizeMultiple(Device device)Retrieves the preferred work-group multiple in the specified device for this kernel instance.Kernel.KernelStategetKernelState()protected intgetLocalId()Determine the local id of an executing kernel.protected intgetLocalId(int _dim)protected intgetLocalSize()Determine the size of the group that an executing kernel is a member of.protected intgetLocalSize(int _dim)static java.lang.StringgetMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)protected intgetNumGroups()Determine the number of groups that will be used to execute a kernelprotected intgetNumGroups(int _dim)protected intgetPassId()Determine the passId of an executing kernel.java.util.List<ProfileInfo>getProfileInfo()Get the profiling information from the last successful call to Kernel.execute().java.lang.ref.WeakReference<ProfileReport>getProfileReportCurrentThread(Device device)Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.java.lang.ref.WeakReference<ProfileReport>getProfileReportLastThread(Device device)Retrieves a profile report for the last thread that executed this kernel on the given device.private static <V,T extends java.lang.Throwable>
VgetProperty(ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,V>,T> cache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry, V defaultValue)private static java.lang.StringgetReturnTypeLetter(java.lang.reflect.Method meth)DevicegetTargetDevice()protected voidglobalBarrier()Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.booleanhasFallbackAlgorithm()False by default.booleanhasNextExecutionMode()Deprecated.protected doublehypot(double a, double b)protected floathypot(float a, float b)protected doubleIEEEremainder(double _d1, double _d2)Delegates to eitherMath.IEEEremainder(double, double)(Java) orremainder(double, double)(OpenCL).protected floatIEEEremainder(float _f1, float _f2)Delegates to eitherMath.IEEEremainder(double, double)(Java) orremainder(float, float)(OpenCL).static voidinvalidateCaches()booleanisAllowDevice(Device _device)booleanisAutoCleanUpArrays()booleanisExecuting()booleanisExplicit()For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory managementstatic booleanisMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)static booleanisOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)private static booleanisRelevant(java.lang.reflect.Method method)booleanisRunningCL()protected voidlocalBarrier()Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.protected voidlocalGlobalBarrier()Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.protected doublelog(double _d)Delegates to eitherMath.log(double)(Java) orlog(double)(OpenCL).protected floatlog(float _f)Delegates to eitherMath.log(double)(Java) orlog(float)(OpenCL).protected doublelog10(double a)protected floatlog10(float a)protected doublelog1p(double x)protected floatlog1p(float x)protected doublelog2(double a)protected floatlog2(float a)protected doublemad(double a, double b, double c)protected floatmad(float a, float b, float c)private static <A extends java.lang.annotation.Annotation>
ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException>markedWith(java.lang.Class<A> annotationClass)protected doublemax(double _d1, double _d2)Delegates to eitherMath.max(double, double)(Java) orfmax(double, double)(OpenCL).protected floatmax(float _f1, float _f2)Delegates to eitherMath.max(float, float)(Java) orfmax(float, float)(OpenCL).protected intmax(int n1, int n2)Delegates to eitherMath.max(int, int)(Java) ormax(int, int)(OpenCL).protected longmax(long n1, long n2)Delegates to eitherMath.max(long, long)(Java) ormax(long, long)(OpenCL).protected doublemin(double _d1, double _d2)Delegates to eitherMath.min(double, double)(Java) orfmin(double, double)(OpenCL).protected floatmin(float _f1, float _f2)Delegates to eitherMath.min(float, float)(Java) orfmin(float, float)(OpenCL).protected intmin(int n1, int n2)Delegates to eitherMath.min(int, int)(Java) ormin(int, int)(OpenCL).protected longmin(long n1, long n2)Delegates to eitherMath.min(long, long)(Java) ormin(long, long)(OpenCL).private floatnative_rsqrt(float _f)private floatnative_sqrt(float _f)protected doublenextAfter(double start, double direction)protected floatnextAfter(float start, float direction)protected intpopcount(int _i)Delegates to eitherInteger.bitCount(int)(Java) orpopcount(int)(OpenCL).protected longpopcount(long _i)Delegates to eitherLong.bitCount(long)(Java) orpopcount(long)(OpenCL).protected doublepow(double _d1, double _d2)Delegates to eitherMath.pow(double, double)(Java) orpow(double, double)(OpenCL).protected floatpow(float _f1, float _f2)Delegates to eitherMath.pow(double, double)(Java) orpow(float, float)(OpenCL).private KernelRunnerprepareKernelRunner()Kernelput(boolean[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(boolean[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(boolean[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(byte[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(byte[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(byte[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(char[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(char[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(char[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(double[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(double[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(double[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(float[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(float[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(float[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(int[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(int[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(int[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(long[] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(long[][] array)Tag this array so that it is explicitly enqueued before the kernel is executedKernelput(long[][][] array)Tag this array so that it is explicitly enqueued before the kernel is executedvoidregisterProfileReportObserver(IProfileReportObserver observer)Registers a new profile report observer to receive profile reports as they're produced.protected doublerint(double _d)Delegates to eitherMath.rint(double)(Java) orrint(double)(OpenCL).protected floatrint(float _f)Delegates to eitherMath.rint(double)(Java) orrint(float)(OpenCL).protected longround(double _d)Delegates to eitherMath.round(double)(Java) orround(double)(OpenCL).protected intround(float _f)Delegates to eitherMath.round(float)(Java) orround(float)(OpenCL).protected doublersqrt(double _d)Computes inverse square root usingMath.sqrt(double)(Java) or delegates torsqrt(double)(OpenCL).protected floatrsqrt(float _f)Computes inverse square root usingMath.sqrt(double)(Java) or delegates torsqrt(double)(OpenCL).abstract voidrun()The entry point of a kernel.voidsetAutoCleanUpArrays(boolean autoCleanUpArrays)Property which if true enables automatic calling ofcleanUpArrays()following each execution.voidsetExecutionMode(Kernel.EXECUTION_MODE _executionMode)Deprecated.voidsetExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode)voidsetExplicit(boolean _explicit)For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory managementvoidsetFallbackExecutionMode()Deprecated.protected doublesin(double _d)Delegates to eitherMath.sin(double)(Java) orsin(double)(OpenCL).protected floatsin(float _f)Delegates to eitherMath.sin(double)(Java) orsin(float)(OpenCL).protected doublesinh(double x)Delegates to eitherMath.sinh(double)(Java) orsinh(double)(OpenCL).protected floatsinh(float x)Delegates to eitherMath.sinh(double)(Java) orsinh(float)(OpenCL).protected doublesinpi(double a)Backed by eitherMath.sin(double)(Java) orsinpi(double)(OpenCL).protected floatsinpi(float a)Backed by eitherMath.sin(double)(Java) orsinpi(float)(OpenCL).protected doublesqrt(double _d)Delegates to eitherMath.sqrt(double)(Java) orsqrt(double)(OpenCL).protected floatsqrt(float _f)Delegates to eitherMath.sqrt(double)(Java) orsqrt(float)(OpenCL).protected doubletan(double _d)Delegates to eitherMath.tan(double)(Java) ortan(double)(OpenCL).protected floattan(float _f)Delegates to eitherMath.tan(double)(Java) ortan(float)(OpenCL).protected doubletanh(double x)Delegates to eitherMath.tanh(double)(Java) ortanh(double)(OpenCL).protected floattanh(float x)Delegates to eitherjava.lang.Math#tanh(float)(Java) ortanh(float)(OpenCL).protected doubletanpi(double a)Backed by eitherMath.tan(double)(Java) ortanpi(double)(OpenCL).protected floattanpi(float a)Backed by eitherMath.tan(double)(Java) ortanpi(float)(OpenCL).private static java.lang.StringtoClassShortNameIfAny(java.lang.Class<?> retClass)protected doubletoDegrees(double _d)Delegates to eitherMath.toDegrees(double)(Java) ordegrees(double)(OpenCL).protected floattoDegrees(float _f)Delegates to eitherMath.toDegrees(double)(Java) ordegrees(float)(OpenCL).protected doubletoRadians(double _d)Delegates to eitherMath.toRadians(double)(Java) orradians(double)(OpenCL).protected floattoRadians(float _f)Delegates to eitherMath.toRadians(double)(Java) orradians(float)(OpenCL).private static java.lang.StringtoSignature(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)(package private) static java.lang.StringtoSignature(java.lang.reflect.Method method)java.lang.StringtoString()voidtryNextExecutionMode()Deprecated.static booleanusesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)static booleanusesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
-
-
Field Detail
-
logger
private static java.util.logging.Logger logger
-
LOCAL_SUFFIX
public static final java.lang.String LOCAL_SUFFIX
We can use this suffix to 'tag' intended local buffers. So either name the buffer
Or use the Annotation formint[] buffer_$local$ = new int[1024];@Local int[] buffer = new int[1024];- See Also:
- Constant Field Values
-
CONSTANT_SUFFIX
public static final java.lang.String CONSTANT_SUFFIX
We can use this suffix to 'tag' intended constant buffers. So either name the buffer
Or use the Annotation formint[] buffer_$constant$ = new int[1024];@Constant int[] buffer = new int[1024];- See Also:
- Constant Field Values
-
PRIVATE_SUFFIX
public static final java.lang.String PRIVATE_SUFFIX
We can use this suffix to 'tag' __private buffers.So either name the buffer
Or use the Annotation formint[] buffer_$private$32 = new int[32];@PrivateMemorySpace(32) int[] buffer = new int[32];
-
kernelRunner
private KernelRunner kernelRunner
-
autoCleanUpArrays
private boolean autoCleanUpArrays
-
kernelState
private Kernel.KernelState kernelState
-
LOG_2_RECIPROCAL
private static final double LOG_2_RECIPROCAL
-
PI_RECIPROCAL
private static final double PI_RECIPROCAL
- See Also:
- Constant Field Values
-
minOperator
private static final java.util.function.IntBinaryOperator minOperator
-
maxOperator
private static final java.util.function.IntBinaryOperator maxOperator
-
andOperator
private static final java.util.function.IntBinaryOperator andOperator
-
orOperator
private static final java.util.function.IntBinaryOperator orOperator
-
xorOperator
private static final java.util.function.IntBinaryOperator xorOperator
-
typeToLetterMap
static final java.util.Map<java.lang.String,java.lang.String> typeToLetterMap
-
useNullForLocalSize
boolean useNullForLocalSize
-
executionModes
@Deprecated private final java.util.LinkedHashSet<Kernel.EXECUTION_MODE> executionModes
Deprecated.
-
currentMode
@Deprecated private java.util.Iterator<Kernel.EXECUTION_MODE> currentMode
Deprecated.
-
executionMode
@Deprecated private Kernel.EXECUTION_MODE executionMode
Deprecated.
-
mappedMethodFlags
private static final ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> mappedMethodFlags
-
openCLDelegateMethodFlags
private static final ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> openCLDelegateMethodFlags
-
atomic32Cache
private static final ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> atomic32Cache
-
atomic64Cache
private static final ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> atomic64Cache
-
mappedMethodNamesCache
private static final ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.String>,java.lang.RuntimeException> mappedMethodNamesCache
-
-
Method Detail
-
getGlobalId
protected final int getGlobalId()
Determine the globalId of an executing kernel.The kernel implementation uses the globalId to determine which of the executing kernels (in the global domain space) this invocation is expected to deal with.
For example in a
SquareKernelimplementation:class SquareKernel extends Kernel{ private int values[]; private int squares[]; public SquareKernel(int values[]){ this.values = values; squares = new int[values.length]; } public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } public int[] getSquares(){ return(squares); } }Each invocation of
SquareKernel.run()retrieves it's globalId by callinggetGlobalId(), and then computes the value ofsquare[gid]for a given value ofvalue[gid].- Returns:
- The globalId for the Kernel being executed
- See Also:
getLocalId(),getGroupId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getGlobalId
protected final int getGlobalId(int _dim)
-
getGroupId
protected final int getGroupId()
Determine the groupId of an executing kernel.When a
Kernel.execute(int globalSize)is invoked for a particular kernel, the runtime will break the work into various 'groups'.A kernel can use
getGroupId()to determine which group a kernel is currently dispatched toThe following code would capture the groupId for each kernel and map it against globalId.
final int[] groupIds = new int[1024]; Kernel kernel = new Kernel(){ public void run() { int gid = getGlobalId(); groupIds[gid] = getGroupId(); } }; kernel.execute(groupIds.length); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d\n", i, groupIds[i]); }- Returns:
- The groupId for this Kernel being executed
- See Also:
getLocalId(),getGlobalId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getGroupId
protected final int getGroupId(int _dim)
-
getPassId
protected final int getPassId()
Determine the passId of an executing kernel.When a
Kernel.execute(int globalSize, int passes)is invoked for a particular kernel, the runtime will break the work into various 'groups'.A kernel can use
getPassId()to determine which pass we are in. This is ideal for 'reduce' type phases- Returns:
- The groupId for this Kernel being executed
- See Also:
getLocalId(),getGlobalId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getLocalId
protected final int getLocalId()
Determine the local id of an executing kernel.When a
Kernel.execute(int globalSize)is invoked for a particular kernel, the runtime will break the work into various 'groups'.getLocalId()can be used to determine the relative id of the current kernel within a specific group.The following code would capture the groupId for each kernel and map it against globalId.
final int[] localIds = new int[1024]; Kernel kernel = new Kernel(){ public void run() { int gid = getGlobalId(); localIds[gid] = getLocalId(); } }; kernel.execute(localIds.length); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d\n", i, localIds[i]); }- Returns:
- The local id for this Kernel being executed
- See Also:
getGroupId(),getGlobalId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getLocalId
protected final int getLocalId(int _dim)
-
getLocalSize
protected final int getLocalSize()
Determine the size of the group that an executing kernel is a member of.When a
Kernel.execute(int globalSize)is invoked for a particular kernel, the runtime will break the work into various 'groups'.getLocalSize()allows a kernel to determine the size of the current group.Note groups may not all be the same size. In particular, if
(global size)%(# of compute devices)!=0, the runtime can choose to dispatch kernels to groups with differing sizes.- Returns:
- The size of the currently executing group.
- See Also:
getGroupId(),getGlobalId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getLocalSize
protected final int getLocalSize(int _dim)
-
getGlobalSize
protected final int getGlobalSize()
Determine the value that was passed toKernel.execute(int globalSize)method.- Returns:
- The value passed to
Kernel.execute(int globalSize)causing the current execution. - See Also:
getGroupId(),getGlobalId(),getNumGroups(),getLocalSize()
-
getGlobalSize
protected final int getGlobalSize(int _dim)
-
getNumGroups
protected final int getNumGroups()
Determine the number of groups that will be used to execute a kernelWhen
Kernel.execute(int globalSize)is invoked, the runtime will split the work into multiple 'groups'.getNumGroups()returns the total number of groups that will be used.- Returns:
- The number of groups that kernels will be dispatched into.
- See Also:
getGroupId(),getGlobalId(),getGlobalSize(),getNumGroups(),getLocalSize()
-
getNumGroups
protected final int getNumGroups(int _dim)
-
run
public abstract void run()
The entry point of a kernel.Every kernel must override this method.
-
hasFallbackAlgorithm
public boolean hasFallbackAlgorithm()
False by default. In the event that all preferred devices fail to execute a kernel, it is possible to supply an alternate (possibly non-parallel) execution algorithm by overriding this method to return true, and overridingexecuteFallbackAlgorithm(Range, int)with the alternate algorithm.
-
executeFallbackAlgorithm
public void executeFallbackAlgorithm(Range _range, int _passId)
IfhasFallbackAlgorithm()has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.This is not normally required, as fallback to
JavaDevice.THREAD_POOLwill implement the algorithm in parallel. However in the event that thread pool execution may be prohibitively slow, this method might implement a "quick and dirty" approximation to the desired result (for example, a simple box-blur as opposed to a gaussian blur in an image processing application).
-
cancelMultiPass
public void cancelMultiPass()
Invoking this method flags that once the current pass is complete execution should be abandoned. Due to the complexity of intercommunication between java (or C) and executing OpenCL, this is the best we can do for general cancellation of execution at present. OpenCL 2.0 should introduce pipe mechanisms which will support mid-pass cancellation easily.Note that in the case of thread-pool/pure java execution we could do better already, using Thread.interrupt() (and/or other means) to abandon execution mid-pass. However at present this is not attempted.
-
getCancelState
public int getCancelState()
-
getCurrentPass
public int getCurrentPass()
- See Also:
KernelRunner.getCurrentPass()
-
isExecuting
public boolean isExecuting()
- See Also:
KernelRunner.isExecuting()
-
clone
public Kernel clone()
When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.If you choose to override
clone()you are responsible for delegating tosuper.clone();- Overrides:
clonein classjava.lang.Object
-
acos
protected float acos(float a)
Delegates to eitherMath.acos(double)(Java) oracos(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate toMath.acos(double)/acos(float)- Returns:
Math.acos(double)casted to float/acos(float)- See Also:
Math.acos(double),acos(float)
-
acos
protected double acos(double a)
Delegates to eitherMath.acos(double)(Java) oracos(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate toMath.acos(double)/acos(double)- Returns:
Math.acos(double)/acos(double)- See Also:
Math.acos(double),acos(double)
-
asin
protected float asin(float _f)
Delegates to eitherMath.asin(double)(Java) orasin(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.asin(double)/asin(float)- Returns:
Math.asin(double)casted to float/asin(float)- See Also:
Math.asin(double),asin(float)
-
asin
protected double asin(double _d)
Delegates to eitherMath.asin(double)(Java) orasin(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.asin(double)/asin(double)- Returns:
Math.asin(double)/asin(double)- See Also:
Math.asin(double),asin(double)
-
atan
protected float atan(float _f)
Delegates to eitherMath.atan(double)(Java) oratan(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.atan(double)/atan(float)- Returns:
Math.atan(double)casted to float/atan(float)- See Also:
Math.atan(double),atan(float)
-
atan
protected double atan(double _d)
Delegates to eitherMath.atan(double)(Java) oratan(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.atan(double)/atan(double)- Returns:
Math.atan(double)/atan(double)- See Also:
Math.atan(double),atan(double)
-
atan2
protected float atan2(float _f1, float _f2)Delegates to eitherMath.atan2(double, double)(Java) oratan2(float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f1- value to delegate to first argument ofMath.atan2(double, double)/atan2(float, float)_f2- value to delegate to second argument ofMath.atan2(double, double)/atan2(float, float)- Returns:
Math.atan2(double, double)casted to float/atan2(float, float)- See Also:
Math.atan2(double, double),atan2(float, float)
-
atan2
protected double atan2(double _d1, double _d2)Delegates to eitherMath.atan2(double, double)(Java) oratan2(double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d1- value to delegate to first argument ofMath.atan2(double, double)/atan2(double, double)_d2- value to delegate to second argument ofMath.atan2(double, double)/atan2(double, double)- Returns:
Math.atan2(double, double)/atan2(double, double)- See Also:
Math.atan2(double, double),atan2(double, double)
-
ceil
protected float ceil(float _f)
Delegates to eitherMath.ceil(double)(Java) orceil(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.ceil(double)/ceil(float)- Returns:
Math.ceil(double)casted to float/ceil(float)- See Also:
Math.ceil(double),ceil(float)
-
ceil
protected double ceil(double _d)
Delegates to eitherMath.ceil(double)(Java) orceil(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.ceil(double)/ceil(double)- Returns:
Math.ceil(double)/ceil(double)- See Also:
Math.ceil(double),ceil(double)
-
cos
protected float cos(float _f)
Delegates to eitherMath.cos(double)(Java) orcos(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.cos(double)/cos(float)- Returns:
Math.cos(double)casted to float/cos(float)- See Also:
Math.cos(double),cos(float)
-
cos
protected double cos(double _d)
Delegates to eitherMath.cos(double)(Java) orcos(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.cos(double)/cos(double)- Returns:
Math.cos(double)/cos(double)- See Also:
Math.cos(double),cos(double)
-
exp
protected float exp(float _f)
Delegates to eitherMath.exp(double)(Java) orexp(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.exp(double)/exp(float)- Returns:
Math.exp(double)casted to float/exp(float)- See Also:
Math.exp(double),exp(float)
-
exp
protected double exp(double _d)
Delegates to eitherMath.exp(double)(Java) orexp(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.exp(double)/exp(double)- Returns:
Math.exp(double)/exp(double)- See Also:
Math.exp(double),exp(double)
-
abs
protected float abs(float _f)
Delegates to eitherMath.abs(float)(Java) orfabs(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.abs(float)/fabs(float)- Returns:
Math.abs(float)/fabs(float)- See Also:
Math.abs(float),fabs(float)
-
popcount
protected int popcount(int _i)
Delegates to eitherInteger.bitCount(int)(Java) orpopcount(int)(OpenCL).- Parameters:
_i- value to delegate toInteger.bitCount(int)/popcount(int)- Returns:
Integer.bitCount(int)/popcount(int)- See Also:
Integer.bitCount(int),popcount(int)
-
popcount
protected long popcount(long _i)
Delegates to eitherLong.bitCount(long)(Java) orpopcount(long)(OpenCL).- Parameters:
_i- value to delegate toLong.bitCount(long)/popcount(long)- Returns:
Long.bitCount(long)/popcount(long)- See Also:
Long.bitCount(long),popcount(long)
-
clz
protected int clz(int _i)
Delegates to eitherInteger.numberOfLeadingZeros(int)(Java) orclz(int)(OpenCL).
-
clz
protected long clz(long _l)
Delegates to eitherLong.numberOfLeadingZeros(long)(Java) orclz(long)(OpenCL).
-
abs
protected double abs(double _d)
Delegates to eitherMath.abs(double)(Java) orfabs(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.abs(double)/fabs(double)- Returns:
Math.abs(double)/fabs(double)- See Also:
Math.abs(double),fabs(double)
-
abs
protected int abs(int n)
Delegates to eitherMath.abs(int)(Java) orabs(int)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
-
abs
protected long abs(long n)
Delegates to eitherMath.abs(long)(Java) orabs(long)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
-
floor
protected float floor(float _f)
Delegates to eitherMath.floor(double)(Java) orfloor(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.floor(double)/floor(float)- Returns:
Math.floor(double)casted to float/floor(float)- See Also:
Math.floor(double),floor(float)
-
floor
protected double floor(double _d)
Delegates to eitherMath.floor(double)(Java) orfloor(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.floor(double)/floor(double)- Returns:
Math.floor(double)/floor(double)- See Also:
Math.floor(double),floor(double)
-
max
protected float max(float _f1, float _f2)Delegates to eitherMath.max(float, float)(Java) orfmax(float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f1- value to delegate to first argument ofMath.max(float, float)/fmax(float, float)_f2- value to delegate to second argument ofMath.max(float, float)/fmax(float, float)- Returns:
Math.max(float, float)/fmax(float, float)- See Also:
Math.max(float, float),fmax(float, float)
-
max
protected double max(double _d1, double _d2)Delegates to eitherMath.max(double, double)(Java) orfmax(double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d1- value to delegate to first argument ofMath.max(double, double)/fmax(double, double)_d2- value to delegate to second argument ofMath.max(double, double)/fmax(double, double)- Returns:
Math.max(double, double)/fmax(double, double)- See Also:
Math.max(double, double),fmax(double, double)
-
max
protected int max(int n1, int n2)Delegates to eitherMath.max(int, int)(Java) ormax(int, int)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
n1- value to delegate toMath.max(int, int)/max(int, int)n2- value to delegate toMath.max(int, int)/max(int, int)- Returns:
Math.max(int, int)/max(int, int)- See Also:
Math.max(int, int),max(int, int)
-
max
protected long max(long n1, long n2)Delegates to eitherMath.max(long, long)(Java) ormax(long, long)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
n1- value to delegate to first argument ofMath.max(long, long)/max(long, long)n2- value to delegate to second argument ofMath.max(long, long)/max(long, long)- Returns:
Math.max(long, long)/max(long, long)- See Also:
Math.max(long, long),max(long, long)
-
min
protected float min(float _f1, float _f2)Delegates to eitherMath.min(float, float)(Java) orfmin(float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f1- value to delegate to first argument ofMath.min(float, float)/fmin(float, float)_f2- value to delegate to second argument ofMath.min(float, float)/fmin(float, float)- Returns:
Math.min(float, float)/fmin(float, float)- See Also:
Math.min(float, float),fmin(float, float)
-
min
protected double min(double _d1, double _d2)Delegates to eitherMath.min(double, double)(Java) orfmin(double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d1- value to delegate to first argument ofMath.min(double, double)/fmin(double, double)_d2- value to delegate to second argument ofMath.min(double, double)/fmin(double, double)- Returns:
Math.min(double, double)/fmin(double, double)- See Also:
Math.min(double, double),fmin(double, double)
-
min
protected int min(int n1, int n2)Delegates to eitherMath.min(int, int)(Java) ormin(int, int)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
n1- value to delegate to first argument ofMath.min(int, int)/min(int, int)n2- value to delegate to second argument ofMath.min(int, int)/min(int, int)- Returns:
Math.min(int, int)/min(int, int)- See Also:
Math.min(int, int),min(int, int)
-
min
protected long min(long n1, long n2)Delegates to eitherMath.min(long, long)(Java) ormin(long, long)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
n1- value to delegate to first argument ofMath.min(long, long)/min(long, long)n2- value to delegate to second argument ofMath.min(long, long)/min(long, long)- Returns:
Math.min(long, long)/min(long, long)- See Also:
Math.min(long, long),min(long, long)
-
log
protected float log(float _f)
Delegates to eitherMath.log(double)(Java) orlog(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.log(double)/log(float)- Returns:
Math.log(double)casted to float/log(float)- See Also:
Math.log(double),log(float)
-
log
protected double log(double _d)
Delegates to eitherMath.log(double)(Java) orlog(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.log(double)/log(double)- Returns:
Math.log(double)/log(double)- See Also:
Math.log(double),log(double)
-
pow
protected float pow(float _f1, float _f2)Delegates to eitherMath.pow(double, double)(Java) orpow(float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f1- value to delegate to first argument ofMath.pow(double, double)/pow(float, float)_f2- value to delegate to second argument ofMath.pow(double, double)/pow(float, float)- Returns:
Math.pow(double, double)casted to float/pow(float, float)- See Also:
Math.pow(double, double),pow(float, float)
-
pow
protected double pow(double _d1, double _d2)Delegates to eitherMath.pow(double, double)(Java) orpow(double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d1- value to delegate to first argument ofMath.pow(double, double)/pow(double, double)_d2- value to delegate to second argument ofMath.pow(double, double)/pow(double, double)- Returns:
Math.pow(double, double)/pow(double, double)- See Also:
Math.pow(double, double),pow(double, double)
-
IEEEremainder
protected float IEEEremainder(float _f1, float _f2)Delegates to eitherMath.IEEEremainder(double, double)(Java) orremainder(float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f1- value to delegate to first argument ofMath.IEEEremainder(double, double)/remainder(float, float)_f2- value to delegate to second argument ofMath.IEEEremainder(double, double)/remainder(float, float)- Returns:
Math.IEEEremainder(double, double)casted to float/remainder(float, float)- See Also:
Math.IEEEremainder(double, double),remainder(float, float)
-
IEEEremainder
protected double IEEEremainder(double _d1, double _d2)Delegates to eitherMath.IEEEremainder(double, double)(Java) orremainder(double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d1- value to delegate to first argument ofMath.IEEEremainder(double, double)/remainder(double, double)_d2- value to delegate to second argument ofMath.IEEEremainder(double, double)/remainder(double, double)- Returns:
Math.IEEEremainder(double, double)/remainder(double, double)- See Also:
Math.IEEEremainder(double, double),remainder(double, double)
-
toRadians
protected float toRadians(float _f)
Delegates to eitherMath.toRadians(double)(Java) orradians(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.toRadians(double)/radians(float)- Returns:
Math.toRadians(double)casted to float/radians(float)- See Also:
Math.toRadians(double),radians(float)
-
toRadians
protected double toRadians(double _d)
Delegates to eitherMath.toRadians(double)(Java) orradians(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.toRadians(double)/radians(double)- Returns:
Math.toRadians(double)/radians(double)- See Also:
Math.toRadians(double),radians(double)
-
toDegrees
protected float toDegrees(float _f)
Delegates to eitherMath.toDegrees(double)(Java) ordegrees(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.toDegrees(double)/degrees(float)- Returns:
Math.toDegrees(double)casted to float/degrees(float)- See Also:
Math.toDegrees(double),degrees(float)
-
toDegrees
protected double toDegrees(double _d)
Delegates to eitherMath.toDegrees(double)(Java) ordegrees(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.toDegrees(double)/degrees(double)- Returns:
Math.toDegrees(double)/degrees(double)- See Also:
Math.toDegrees(double),degrees(double)
-
rint
protected float rint(float _f)
Delegates to eitherMath.rint(double)(Java) orrint(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.rint(double)/rint(float)- Returns:
Math.rint(double)casted to float/rint(float)- See Also:
Math.rint(double),rint(float)
-
rint
protected double rint(double _d)
Delegates to eitherMath.rint(double)(Java) orrint(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.rint(double)/rint(double)- Returns:
Math.rint(double)/rint(double)- See Also:
Math.rint(double),rint(double)
-
round
protected int round(float _f)
Delegates to eitherMath.round(float)(Java) orround(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.round(float)/round(float)- Returns:
Math.round(float)/round(float)- See Also:
Math.round(float),round(float)
-
round
protected long round(double _d)
Delegates to eitherMath.round(double)(Java) orround(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.round(double)/round(double)- Returns:
Math.round(double)/round(double)- See Also:
Math.round(double),round(double)
-
sin
protected float sin(float _f)
Delegates to eitherMath.sin(double)(Java) orsin(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.sin(double)/sin(float)- Returns:
Math.sin(double)casted to float/sin(float)- See Also:
Math.sin(double),sin(float)
-
sin
protected double sin(double _d)
Delegates to eitherMath.sin(double)(Java) orsin(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.sin(double)/sin(double)- Returns:
Math.sin(double)/sin(double)- See Also:
Math.sin(double),sin(double)
-
sqrt
protected float sqrt(float _f)
Delegates to eitherMath.sqrt(double)(Java) orsqrt(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.sqrt(double)/sqrt(float)- Returns:
Math.sqrt(double)casted to float/sqrt(float)- See Also:
Math.sqrt(double),sqrt(float)
-
sqrt
protected double sqrt(double _d)
Delegates to eitherMath.sqrt(double)(Java) orsqrt(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.sqrt(double)/sqrt(double)- Returns:
Math.sqrt(double)/sqrt(double)- See Also:
Math.sqrt(double),sqrt(double)
-
tan
protected float tan(float _f)
Delegates to eitherMath.tan(double)(Java) ortan(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.tan(double)/tan(float)- Returns:
Math.tan(double)casted to float/tan(float)- See Also:
Math.tan(double),tan(float)
-
tan
protected double tan(double _d)
Delegates to eitherMath.tan(double)(Java) ortan(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.tan(double)/tan(double)- Returns:
Math.tan(double)/tan(double)- See Also:
Math.tan(double),tan(double)
-
acospi
protected final double acospi(double a)
-
acospi
protected final float acospi(float a)
-
asinpi
protected final double asinpi(double a)
-
asinpi
protected final float asinpi(float a)
-
atanpi
protected final double atanpi(double a)
-
atanpi
protected final float atanpi(float a)
-
atan2pi
protected final double atan2pi(double y, double x)
-
atan2pi
protected final float atan2pi(float y, double x)
-
cbrt
protected final double cbrt(double a)
-
cbrt
protected final float cbrt(float a)
-
cosh
protected final double cosh(double x)
-
cosh
protected final float cosh(float x)
-
cospi
protected final double cospi(double a)
-
cospi
protected final float cospi(float a)
-
exp2
protected final double exp2(double a)
-
exp2
protected final float exp2(float a)
-
exp10
protected final double exp10(double a)
-
exp10
protected final float exp10(float a)
-
expm1
protected final double expm1(double x)
-
expm1
protected final float expm1(float x)
-
log2
protected final double log2(double a)
-
log2
protected final float log2(float a)
-
log10
protected final double log10(double a)
-
log10
protected final float log10(float a)
-
log1p
protected final double log1p(double x)
-
log1p
protected final float log1p(float x)
-
mad
protected final double mad(double a, double b, double c)
-
mad
protected final float mad(float a, float b, float c)
-
fma
protected float fma(float a, float b, float c)Delegates to either {code}a*b+c{code} (Java) orfma(float, float, float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate to first argument offma(float, float, float)b- value to delegate to second argument offma(float, float, float)c- value to delegate to third argument offma(float, float, float)- Returns:
- a * b + c /
fma(float, float, float) - See Also:
fma(float, float, float)
-
fma
protected double fma(double a, double b, double c)Delegates to either {code}a*b+c{code} (Java) orfma(double, double, double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate to first argument offma(double, double, double)b- value to delegate to second argument offma(double, double, double)c- value to delegate to third argument offma(double, double, double)- Returns:
- a * b + c /
fma(double, double, double) - See Also:
fma(double, double, double)
-
nextAfter
protected final double nextAfter(double start, double direction)
-
nextAfter
protected final float nextAfter(float start, float direction)
-
sinh
protected final double sinh(double x)
Delegates to eitherMath.sinh(double)(Java) orsinh(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
x- value to delegate toMath.sinh(double)/sinh(double)- Returns:
Math.sinh(double)/sinh(double)- See Also:
Math.sinh(double),sinh(double)
-
sinh
protected final float sinh(float x)
Delegates to eitherMath.sinh(double)(Java) orsinh(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
x- value to delegate toMath.sinh(double)/sinh(float)- Returns:
Math.sinh(double)/sinh(float)- See Also:
Math.sinh(double),sinh(float)
-
sinpi
protected final double sinpi(double a)
Backed by eitherMath.sin(double)(Java) orsinpi(double)(OpenCL). This method is equivelant toMath.sin(a * Math.PI)User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate tosinpi(double)or java equivelant- Returns:
sinpi(double)or java equivelant- See Also:
Math.sin(double),sinpi(double)
-
sinpi
protected final float sinpi(float a)
Backed by eitherMath.sin(double)(Java) orsinpi(float)(OpenCL). This method is equivelant toMath.sin(a * Math.PI)User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate tosinpi(float)or java equivelant- Returns:
sinpi(float)or java equivelant- See Also:
Math.sin(double),sinpi(float)
-
tanh
protected final double tanh(double x)
Delegates to eitherMath.tanh(double)(Java) ortanh(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
x- value to delegate toMath.tanh(double)/tanh(double)- Returns:
Math.tanh(double)/tanh(double)- See Also:
Math.tanh(double),tanh(double)
-
tanh
protected final float tanh(float x)
Delegates to eitherjava.lang.Math#tanh(float)(Java) ortanh(float)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
x- value to delegate tojava.lang.Math#tanh(float)/tanh(float)- Returns:
java.lang.Math#tanh(float)/tanh(float)- See Also:
java.lang.Math#tanh(float),tanh(float)
-
tanpi
protected final double tanpi(double a)
Backed by eitherMath.tan(double)(Java) ortanpi(double)(OpenCL). This method is equivelant toMath.tan(a * Math.PI)User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate totanpi(double)or java equivelant- Returns:
tanpi(double)or java equivelant- See Also:
Math.tan(double),tanpi(double)
-
tanpi
protected final float tanpi(float a)
Backed by eitherMath.tan(double)(Java) ortanpi(float)(OpenCL). This method is equivelant toMath.tan(a * Math.PI)User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
a- value to delegate totanpi(float)or java equivelant- Returns:
tanpi(float)or java equivelant- See Also:
Math.tan(double),tanpi(float)
-
rsqrt
protected float rsqrt(float _f)
Computes inverse square root usingMath.sqrt(double)(Java) or delegates torsqrt(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_f- value to delegate toMath.sqrt(double)/rsqrt(double)- Returns:
( 1.0f //Math.sqrt(double)casted to float )rsqrt(double)- See Also:
Math.sqrt(double),rsqrt(double)
-
rsqrt
protected double rsqrt(double _d)
Computes inverse square root usingMath.sqrt(double)(Java) or delegates torsqrt(double)(OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.- Parameters:
_d- value to delegate toMath.sqrt(double)/rsqrt(double)- Returns:
( 1.0f //Math.sqrt(double))rsqrt(double)- See Also:
Math.sqrt(double),rsqrt(double)
-
native_sqrt
private float native_sqrt(float _f)
-
native_rsqrt
private float native_rsqrt(float _f)
-
atomicAdd
protected int atomicAdd(int[] _arr, int _index, int _delta)Atomically adds_deltavalue to_indexelement of array_arr(Java) or delegates toatomic_add(volatile int*, int)(OpenCL).- Parameters:
_arr- array for which an element value needs to be atomically incremented by_delta_index- index of the_arrarray that needs to be atomically incremented by_delta_delta- value by which_indexelement of_arrarray needs to be atomically incremented- Returns:
- previous value of
_indexelement of_arrarray - See Also:
atomic_add(volatile int*, int)
-
atomicGet
protected final int atomicGet(java.util.concurrent.atomic.AtomicInteger p)
-
atomicSet
protected final void atomicSet(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicAdd
protected final int atomicAdd(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicSub
protected final int atomicSub(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicXchg
protected final int atomicXchg(java.util.concurrent.atomic.AtomicInteger p, int newVal)
-
atomicInc
protected final int atomicInc(java.util.concurrent.atomic.AtomicInteger p)
-
atomicDec
protected final int atomicDec(java.util.concurrent.atomic.AtomicInteger p)
-
atomicCmpXchg
protected final int atomicCmpXchg(java.util.concurrent.atomic.AtomicInteger p, int expectedVal, int newVal)
-
atomicMin
protected final int atomicMin(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicMax
protected final int atomicMax(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicAnd
protected final int atomicAnd(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicOr
protected final int atomicOr(java.util.concurrent.atomic.AtomicInteger p, int val)
-
atomicXor
protected final int atomicXor(java.util.concurrent.atomic.AtomicInteger p, int val)
-
localBarrier
protected final void localBarrier()
Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
Note1: In OpenCL will execute as barrier(CLK_LOCAL_MEM_FENCE), which will have a different behaviour than in Java, because it will only guarantee visibility of modifications made to local memory space to all threads leaving the barrier.
Note2: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).
Note3: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
-
globalBarrier
protected final void globalBarrier()
Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
Note1: In OpenCL will execute as barrier(CLK_GLOBAL_MEM_FENCE), which will have a different behaviour; than in Java, because it will only guarantee visibility of modifications made to global memory space to all threads, in the work group, leaving the barrier.
Note2: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).
Note3: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
-
localGlobalBarrier
protected final void localGlobalBarrier()
Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
Note1: When in doubt, use this barrier instead of localBarrier() or globalBarrier(), despite the possible performance loss.
Note2: In OpenCL will execute as barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE), which will have the same behaviour than in Java, because it will guarantee the visibility of modifications made to any of the memory spaces to all threads, in the work group, leaving the barrier.
Note3: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).
Note4: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
-
hypot
protected float hypot(float a, float b)
-
hypot
protected double hypot(double a, double b)
-
getKernelState
public Kernel.KernelState getKernelState()
-
prepareKernelRunner
private KernelRunner prepareKernelRunner()
-
registerProfileReportObserver
public void registerProfileReportObserver(IProfileReportObserver observer)
Registers a new profile report observer to receive profile reports as they're produced. This is the method recommended when the client application desires to receive all the execution profiles for the current kernel instance on all devices over all client threads running such kernel with a single observer
Note1: A report will be generated by a thread that finishes executing a kernel. In multithreaded execution environments it is up to the observer implementation to handle thread safety.
Note2: To cancel the report subscription just set observer tonullvalue.- Parameters:
observer- the observer instance that will receive the profile reports
-
getProfileReportLastThread
public java.lang.ref.WeakReference<ProfileReport> getProfileReportLastThread(Device device)
Retrieves a profile report for the last thread that executed this kernel on the given device. A report will only be available if at least one thread executed the kernel on the device.
Note1: If the profile report is intended to be kept in memory, the object should be cloned withProfileReport.clone()- Parameters:
device- the relevant device where the kernel executed- Returns:
- the profiling report for the current most recent execution
- null, if no profiling report is available for such thread
- See Also:
getProfileReportCurrentThread(Device),registerProfileReportObserver(IProfileReportObserver),getAccumulatedExecutionTimeAllThreads(Device),#getExecutionTimeLastThread(),#getConversionTimeLastThread()
-
getProfileReportCurrentThread
public java.lang.ref.WeakReference<ProfileReport> getProfileReportCurrentThread(Device device)
Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.
Note1: If the profile report is intended to be kept in memory, the object should be cloned withProfileReport.clone()
Note2: If the thread didn't execute this kernel on the specified device, it will return null.- Parameters:
device- the relevant device where the kernel executed- Returns:
- the profiling report for the current most recent execution
- null, if no profiling report is available for such thread
- See Also:
getProfileReportLastThread(Device),registerProfileReportObserver(IProfileReportObserver),#getExecutionTimeCurrentThread(Device),#getConversionTimeCurrentThread(Device),getAccumulatedExecutionTimeAllThreads(Device)
-
getExecutionTime
public double getExecutionTime()
Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.
Note1: This is kept for backwards compatibility only, usage of eithergetProfileReportLastThread(Device)orregisterProfileReportObserver(IProfileReportObserver)is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
Note that for the first call this will include the conversion time.
- Returns:
- The time spent executing the kernel (ms)
- NaN, if no profile report is available
- See Also:
getProfileReportCurrentThread(Device),registerProfileReportObserver(IProfileReportObserver),getAccumulatedExecutionTimeAllThreads(Device)
-
getConversionTime
public double getConversionTime()
Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.
Note1: This is kept for backwards compatibility only, usage of eithergetProfileReportLastThread(Device)orregisterProfileReportObserver(IProfileReportObserver)is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
Note that for the first call this will include the conversion time.
- Returns:
- The time spent preparing the kernel for execution using GPU
- NaN, if no profile report is available
- See Also:
getProfileReportCurrentThread(Device),registerProfileReportObserver(IProfileReportObserver),getAccumulatedExecutionTimeAllThreads(Device)
-
getAccumulatedExecutionTimeCurrentThread
public double getAccumulatedExecutionTimeCurrentThread(Device device)
Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.
Note1: This is the recommended method to retrieve the accumulated execution time for a single current thread, even when doing multithreading for the same kernel and device.
Note that this will include the initial conversion time.- Parameters:
the- device of interest where the kernel executed- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
getProfileReportCurrentThread(Device),getProfileReportLastThread(Device),registerProfileReportObserver(IProfileReportObserver),getAccumulatedExecutionTimeAllThreads(Device)
-
getAccumulatedExecutionTimeAllThreads
public double getAccumulatedExecutionTimeAllThreads(Device device)
Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.
Note1: This is the recommended method to retrieve the accumulated execution time, even when doing multithreading for the same kernel and device.
Note that this will include the initial conversion time.- Parameters:
the- device of interest where the kernel executed- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
getProfileReportCurrentThread(Device),getProfileReportLastThread(Device),registerProfileReportObserver(IProfileReportObserver),getAccumulatedExecutionTimeCurrentThread(Device)
-
getAccumulatedExecutionTime
public double getAccumulatedExecutionTime()
Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.
Note1: This is kept for backwards compatibility only, usage ofgetAccumulatedExecutionTimeAllThreads(Device)is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel on multiple devices concurrently.
Note that this will include the initial conversion time.- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
#getProfileReport(Device),registerProfileReportObserver(IProfileReportObserver)
-
execute
public Kernel execute(Range _range)
Start execution of_rangekernels.When
kernel.execute(globalSize)is invoked, Aparapi will schedule the execution ofglobalSizekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_range- The number of Kernels that we would like to initiate.
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
execute
public Kernel execute(int _range)
Start execution of_rangekernels.When
kernel.execute(_range)is 1invoked, Aparapi will schedule the execution of_rangekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.Since adding the new
Range classthis method offers backward compatibility and merely defers toreturn (execute(Range.create(_range), 1));.- Parameters:
_range- The number of Kernels that we would like to initiate.
-
createRange
protected Range createRange(int _range)
-
execute
public Kernel execute(Range _range, int _passes)
Start execution of_passesiterations of_rangekernels.When
kernel.execute(_range, _passes)is invoked, Aparapi will schedule the execution of_reangekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_passes- The number of passes to make- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
execute
public Kernel execute(int _range, int _passes)
Start execution of_passesiterations over the_rangeof kernels.When
kernel.execute(_range)is invoked, Aparapi will schedule the execution of_rangekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.Since adding the new
Range classthis method offers backward compatibility and merely defers toreturn (execute(Range.create(_range), 1));.- Parameters:
_range- The number of Kernels that we would like to initiate.
-
execute
public Kernel execute(java.lang.String _entrypoint, Range _range)
Start execution ofglobalSizekernels for the given entrypoint.When
kernel.execute("entrypoint", globalSize)is invoked, Aparapi will schedule the execution ofglobalSizekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_entrypoint- is the name of the method we wish to use as the entrypoint to the kernel- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
execute
public Kernel execute(java.lang.String _entrypoint, Range _range, int _passes)
Start execution ofglobalSizekernels for the given entrypoint.When
kernel.execute("entrypoint", globalSize)is invoked, Aparapi will schedule the execution ofglobalSizekernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_entrypoint- is the name of the method we wish to use as the entrypoint to the kernel- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
compile
public Kernel compile(Device _device) throws CompileFailedException
Force pre-compilation of the kernel for a given device, without executing it.- Parameters:
_device- the device for which the kernel is to be compiled- Returns:
- the Kernel instance (this) so we can chain calls
- Throws:
CompileFailedException- if compilation failed for some reason
-
compile
public Kernel compile(java.lang.String _entrypoint, Device _device) throws CompileFailedException
Force pre-compilation of the kernel for a given device, without executing it.- Parameters:
_entrypoint- is the name of the method we wish to use as the entrypoint to the kernel_device- the device for which the kernel is to be compiled- Returns:
- the Kernel instance (this) so we can chain calls
- Throws:
CompileFailedException- if compilation failed for some reason
-
getKernelMinimumPrivateMemSizeInUsePerWorkItem
public long getKernelMinimumPrivateMemSizeInUsePerWorkItem(Device device) throws QueryFailedException
Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.- Parameters:
device- the device where the kernel is intended to run- Returns:
- the number of bytes used per work item
- Throws:
QueryFailedException- if the query couldn't complete
-
getKernelLocalMemSizeInUse
public long getKernelLocalMemSizeInUse(Device device) throws QueryFailedException
Retrieves the amount of local memory used in the specified device by this kernel instance.- Parameters:
device- the device where the kernel is intended to run- Returns:
- the number of bytes of local memory in use for the specified device and current kernel
- Throws:
QueryFailedException- if the query couldn't complete
-
getKernelPreferredWorkGroupSizeMultiple
public int getKernelPreferredWorkGroupSizeMultiple(Device device) throws QueryFailedException
Retrieves the preferred work-group multiple in the specified device for this kernel instance.- Parameters:
device- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException- if the query couldn't complete
-
getKernelMaxWorkGroupSize
public int getKernelMaxWorkGroupSize(Device device) throws QueryFailedException
Retrieves the maximum work-group size allowed for this kernel when running on the specified device.- Parameters:
device- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException- if the query couldn't complete
-
getKernelCompileWorkGroupSize
public int[] getKernelCompileWorkGroupSize(Device device) throws QueryFailedException
Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.- Parameters:
device- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException- if the query couldn't complete
-
isAutoCleanUpArrays
public boolean isAutoCleanUpArrays()
-
setAutoCleanUpArrays
public void setAutoCleanUpArrays(boolean autoCleanUpArrays)
Property which if true enables automatic calling ofcleanUpArrays()following each execution.
-
cleanUpArrays
public void cleanUpArrays()
Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitiveKernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range. Unlikedispose(), this does not prohibit further invocations of this kernel, as sundry resources such as OpenCL queues are not freed by this method.This allows a "dormant" Kernel to remain in existence without undue strain on GPU resources, which may be strongly preferable to disposing a Kernel and recreating another one later, as creation/use of a new Kernel (specifically creation of its associated OpenCL context) is expensive.
Note that where the underlying array field is declared final, for obvious reasons it is not resized to zero.
-
dispose
public void dispose()
Release any resources associated with this Kernel.When the execution mode is
CPUorGPU, Aparapi stores some OpenCL resources in a data structure associated with the kernel instance. Thedispose()method must be called to release these resources.If
execute(int _globalSize)is called afterdispose()is called the results are undefined.
-
isRunningCL
public boolean isRunningCL()
-
getTargetDevice
public final Device getTargetDevice()
-
isAllowDevice
public boolean isAllowDevice(Device _device)
- Returns:
- true by default, may be overriden to allow vetoing of a device or devices by a given Kernel instance.
-
getExecutionMode
@Deprecated public Kernel.EXECUTION_MODE getExecutionMode()
Deprecated.SeeKernel.EXECUTION_MODEReturn the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode().
After a Kernel executes, the return value will be the mode in which the Kernel actually executed.
- Returns:
- The current execution mode.
- See Also:
setExecutionMode(EXECUTION_MODE)
-
setExecutionMode
@Deprecated public void setExecutionMode(Kernel.EXECUTION_MODE _executionMode)
Deprecated.SeeKernel.EXECUTION_MODESet the execution mode.
This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload.
- Parameters:
_executionMode- the requested execution mode.- See Also:
getExecutionMode()
-
setExecutionModeWithoutFallback
public void setExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode)
-
setFallbackExecutionMode
@Deprecated public void setFallbackExecutionMode()
Deprecated.
-
descriptorToReturnTypeLetter
private static java.lang.String descriptorToReturnTypeLetter(java.lang.String desc)
-
getReturnTypeLetter
private static java.lang.String getReturnTypeLetter(java.lang.reflect.Method meth)
-
toClassShortNameIfAny
private static java.lang.String toClassShortNameIfAny(java.lang.Class<?> retClass)
-
getMappedMethodName
public static java.lang.String getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)
-
isMappedMethod
public static boolean isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
isOpenCLDelegateMethod
public static boolean isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
usesAtomic32
public static boolean usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
usesAtomic64
public static boolean usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
setExplicit
public void setExplicit(boolean _explicit)
For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management- Parameters:
_explicit- (true if we want explicit memory management)
-
isExplicit
public boolean isExplicit()
For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management- Returns:
- (true if we kernel is using explicit memory management)
-
put
public Kernel put(long[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(long[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(long[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(double[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(double[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(double[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(float[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(float[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(float[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(int[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(int[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(int[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(byte[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(byte[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(byte[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(char[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(char[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(char[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(boolean[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(boolean[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
public Kernel put(boolean[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(long[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(long[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(long[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(double[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(double[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(double[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(float[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(float[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(float[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(int[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(int[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(int[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(byte[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(byte[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(byte[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(char[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(char[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(char[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(boolean[] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(boolean[][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
public Kernel get(boolean[][][] array)
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array-- Returns:
- This kernel so that we can use the 'fluent' style API
-
getProfileInfo
public java.util.List<ProfileInfo> getProfileInfo()
Get the profiling information from the last successful call to Kernel.execute().- Returns:
- A list of ProfileInfo records
-
addExecutionModes
@Deprecated public void addExecutionModes(Kernel.EXECUTION_MODE... platforms)
Deprecated.SeeKernel.EXECUTION_MODE.set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP.
-
hasNextExecutionMode
@Deprecated public boolean hasNextExecutionMode()
Deprecated.- Returns:
- is there another execution path we can try
-
tryNextExecutionMode
@Deprecated public void tryNextExecutionMode()
Deprecated.SeeKernel.EXECUTION_MODE. try the next execution path in the list if there aren't any more than give up
-
getBoolean
private static boolean getBoolean(ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> methodNamesCache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
markedWith
private static <A extends java.lang.annotation.Annotation> ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,java.lang.Boolean>,java.lang.RuntimeException> markedWith(java.lang.Class<A> annotationClass)
-
toSignature
static java.lang.String toSignature(java.lang.reflect.Method method)
-
getArgumentsLetters
private static java.lang.String getArgumentsLetters(java.lang.reflect.Method method)
-
isRelevant
private static boolean isRelevant(java.lang.reflect.Method method)
-
getProperty
private static <V,T extends java.lang.Throwable> V getProperty(ValueCache<java.lang.Class<?>,java.util.Map<java.lang.String,V>,T> cache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry, V defaultValue) throws T extends java.lang.Throwable
- Throws:
T extends java.lang.Throwable
-
toSignature
private static java.lang.String toSignature(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
cacheProperty
private static <K,V,T extends java.lang.Throwable> ValueCache<java.lang.Class<?>,java.util.Map<K,V>,T> cacheProperty(ValueCache.ThrowingValueComputer<java.lang.Class<?>,java.util.Map<K,V>,T> throwingValueComputer)
-
invalidateCaches
public static void invalidateCaches()
-
-