Class BatchNode<T>
Data is stored in sharded files, and data is written/consumed and processed concurrently.
The data is processed in batches. Each batch is processed in a single thread. The number of threads is
controlled by
invalid reference
#parallelism(IntSupplier)
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classprivate static final class -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ToIntFunction<T> private final DataInterpreter<T> private final IntSupplierprivate final ProcessingServiceprivate final intprivate Function<File, FromFileReader<T>> private final Throughputprivate final ShardedFileprivate final Throughput -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoiddispose()Dispose of this node and explicitly delete all files.private Function<File, FromFileReader<T>> static <T> BatchNode.Builder<T> newBuilder(File directory, DataInterpreter<T> interpreter) static <T> BatchNode<T> newInstance(File directory, DataInterpreter<T> interpreter) (package private) FromFileReader<T> private voidprivate <R, A extends TwoStepMapper<T,R>>
voidprocessAggregators(File shard, Supplier<A> aggregatorFactory, Consumer<A> processor) voidprocessAll(Consumer<T> processor) Process each and every item individuallyvoidprocessAll(Supplier<Consumer<T>> processorFactory) Similar toprocessAll(Consumer)but you provide a consumer constructor/factory rather than a specific consumer.<R, A extends TwoStepMapper<T,R>>
voidprocessCombineable(Supplier<A> aggregatorFactory, Consumer<A> processor) Similar toprocessMergeable(Supplier, Consumer)but theprocessoris called with the aggregator instance itself rather than its extracted results.<R> voidprocessMapped(Supplier<? extends TwoStepMapper<T, R>> aggregatorFactory, Consumer<R> processor) Deprecated.<R, A extends TwoStepMapper<T,R>>
voidprocessMergeable(Supplier<A> aggregatorFactory, Consumer<R> processor) Each shard is processed/aggregated separately by aTwoStepMapperinstance.private <R, A extends TwoStepMapper<T,R>>
voidprocessResults(File shard, Supplier<A> aggregatorFactory, Consumer<R> processor) <R, A extends TwoStepMapper.Combineable<T,R, A>>
RreduceByCombining(Supplier<A> aggregatorFactory) CallsprocessCombineable(Supplier, Consumer)with themethod of a globalinvalid reference
TwoStepMapper.Combineable#merge(Object)TwoStepMapper.Combineableinstance as theconsumer.<R, A extends TwoStepMapper.Mergeable<T,R>>
RreduceByMerging(Supplier<A> aggregatorFactory) CallsprocessMergeable(Supplier, Consumer)with theTwoStepMapper.Mergeable.merge(Object)method of a globalTwoStepMapper.Mergeableinstance as theconsumer.<R, A extends TwoStepMapper.Mergeable<T,R>>
RreduceMapped(Supplier<A> aggregatorFactory) Deprecated.v54 Useinsteadinvalid reference
#reduceByMerging(Supplier<A>)
-
Field Details
-
DUMMY
-
myDistributor
-
myInterpreter
-
myParallelism
-
myProcessor
-
myQueueCapacity
private final int myQueueCapacity -
myReaderFactory
-
myReaderManager
-
myShards
-
myWriterManger
-
-
Constructor Details
-
BatchNode
BatchNode(BatchNode.Builder<T> builder)
-
-
Method Details
-
newBuilder
-
newInstance
-
dispose
public void dispose()Dispose of this node and explicitly delete all files. -
newWriter
-
processAll
-
processAll
Similar toprocessAll(Consumer)but you provide a consumer constructor/factory rather than a specific consumer. Internally there will be 1 consumer per worker thread instantiated. This variant is for when the consumer(s) are stateful. -
processCombineable
public <R, A extends TwoStepMapper<T,R>> void processCombineable(Supplier<A> aggregatorFactory, Consumer<A> processor) Similar toprocessMergeable(Supplier, Consumer)but theprocessoris called with the aggregator instance itself rather than its extracted results. This corresponds torather thaninvalid reference
TwoStepMapper#Combineable.invalid reference
TwoStepMapper#Mergeable- See Also:
-
processMapped
@Deprecated public <R> void processMapped(Supplier<? extends TwoStepMapper<T, R>> aggregatorFactory, Consumer<R> processor) Deprecated.v54 Useinsteadinvalid reference
#processMergeable(Supplier<? extends TwoStepMapper<T, H>>,Consumer<H>)Process mapped/derived data in batches.There will be one
TwoStepMapperinstance per underlying file/shard – that's a batch. Those instances are likely to contain some sort ofCollectionorMapthat hold mapped/derived data.You must make sure that all data items that need to be in the same
TwoStepMapperinstance (in the same batch) are in the same file/shard. You control the number of shards viaBatchNode.Builder.fragmentation(int)and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction).- Type Parameters:
R- The mapped/derived data holding type- Parameters:
aggregatorFactory- Produces theTwoStepMappermapping instancesprocessor- Consumes the mapped/derived data - the results of one wholeTwoStepMapperinstance at the time
-
processMergeable
public <R, A extends TwoStepMapper<T,R>> void processMergeable(Supplier<A> aggregatorFactory, Consumer<R> processor) Each shard is processed/aggregated separately by aTwoStepMapperinstance. The results are then processed/merged by the providedprocessor.There is one
TwoStepMapperinstance per underlying worker thread. Those instances are reset and reused for each shard.The
processoris called concurrently from multiple threads.You must make sure that all data items that need to be in the same aggregator instance are in the same file/shard. You control the number of shards via
BatchNode.Builder.fragmentation(int)and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction).- Parameters:
aggregatorFactory- Produces theTwoStepMapperaggregator instancesprocessor- Consumes the aggregated/derived data - the results of one wholeTwoStepMapperinstance at the time
-
reduceByCombining
public <R, A extends TwoStepMapper.Combineable<T,R, R reduceByCombiningA>> (Supplier<A> aggregatorFactory) CallsprocessCombineable(Supplier, Consumer)with themethod of a globalinvalid reference
TwoStepMapper.Combineable#merge(Object)TwoStepMapper.Combineableinstance as theconsumer. -
reduceByMerging
CallsprocessMergeable(Supplier, Consumer)with theTwoStepMapper.Mergeable.merge(Object)method of a globalTwoStepMapper.Mergeableinstance as theconsumer. -
reduceMapped
@Deprecated public <R, A extends TwoStepMapper.Mergeable<T,R>> R reduceMapped(Supplier<A> aggregatorFactory) Deprecated.v54 Useinsteadinvalid reference
#reduceByMerging(Supplier<A>)Same asprocessMergeable(Supplier, Consumer), but then also reduce/merge the total results using.invalid reference
TwoStepMapper#merge(Object)Create a class that implements
TwoStepMapperand make sure to also implement- you can only use this if merging partial (sub)results is possible. Use a constructor or factory method that produce instances of that type as the argument to this method.invalid reference
TwoStepMapper#merge(Object) -
getReaderFactory
-
process
-
processAggregators
private <R, A extends TwoStepMapper<T,R>> void processAggregators(File shard, Supplier<A> aggregatorFactory, Consumer<A> processor) -
processResults
private <R, A extends TwoStepMapper<T,R>> void processResults(File shard, Supplier<A> aggregatorFactory, Consumer<R> processor) -
newReader
-
invalid reference