For the purpose of using Spring Batch in a scalable and distributed manner to process huge amount of data, I am actually developing some components to make integration of Spring Batch with compute/data grid easier.
Different solutions is offered by Spring Batch to provide scalability, the one that best suit my needs is remote chunking.
As I already done some investigation before using GridGain I chose this framework to implement a distributed remote chunking system that can be easily integrated into any existing Spring Batch systems.
Using GridGain is really straightforward, and setting up a grid on a development machine doesn't need so much configuration.
The only issue I faced is due to the fact that GridGain use serialization to deploy tasks on nodes, in order to be able to deploy a remote ChunkProcessor, it must contains serializable ItemProcessor and ItemWriter, which unfortunately is not the case by default.
So instead of creating new interfaces, I made a SerializableChunkProcessor which only accept serializable ItemProcessor and ItemWriter. It's surely not the smarter solution, but since I can't modify default interfaces in Spring Batch and I don't want to create my own interfaces, this workaround will suffice.
Usage
Here is the job application context used for the integration test, as you can see the 'real' ItemProcessor / ItemWriter are injected into the GridGain chunk writer:
Download
You can download the spring-batch-integration-gridgain module here:
http://github.com/downloads/aloiscochard/spring-batch-integration-gridgain/spring-batch-integration-gridgain-0.0.1-SNAPSHOT.jar
If you want to see a full working sample, take a look at the integration test. The full project sources can be downloaded here:
http://github.com/aloiscochard/spring-batch-integration-gridgain