For day 2, I decided to focus on technologies that I can see direct immediate application within the practice and topics that has direct impact to my current stack – Spring Batch and Persistence Tuning
Spring Batch 2 and Scaling Batch Applications in the Enterprise
Batch may sound a bit old school but as more corporations move their aging systems to a modern implementation, it looks like this problem space is a growing area. Lucas Ward (Accenture) mentioned that he knows of about 80 projects at Accenture using Batch. In our practice, we also see this type of problem and sometimes in unexpected ways.
For example, importing a file that contains a list of entities is really a batch process – it requires iterating over a set of items, the operation should be idempotent, be able to handle bad entries and possibly retry failed ones – a prototypical batch process.
Spring Batch abstracts these batching concerns such that they become part of the framework and can be declaratively configured. Batch configurations for example can specifiy whether a batch job is restartable, if job items can be skipped if they fail and if so, whether items should be retried. As an application developer, you are only concerned with the business logic in processing the batch item. However, you still need to understand the framework well enough especially if you have multiple transactional resources participating in the batch job.
Spring Batch 2 will contain a number of new features and enhancements including:
- XML namespace for configuration
- Simplication of the Batch API.
- Support for concurrent flows
The second session concerning scaling was interesting. With version 2.0 comes a couple of new strategies that were driven by existing clients who require highly scaled solutions.
Chunking– to distribute the load, a number of the batch items can be chunked and processed by remote VM’s. This strategy has the advantage that it does not require the chunking processor to have any knowledge of the structure of the batch items but does require a durable message-type mechanism to deliver the chunks – a JMS queue would be a natural choice.
Master-Slave Step– a step can be broken down into slave steps and executed remotely. For example, a set of id’s can be divided into segments and each segment is executed by a slave machine. Obviously, in this case, the slaves have to have intimate knowledge of how to process the segments.
One key thing that I got out of the session (which we are very careful about in our practice) is that it is very important to understand the transactional boundaries in item processing. It becomes especially important when it involves multiple transactional resources.
Persistence Tuning for your Spring Application
I have to say at the outset that much of what was covered in this presentation is already quite well known in our practice.
Basically, this is the philosophy
Understand the features of your database well
Don’t be afraid to fully use these features to tune your application
There is no single silver bullet when it comes to tuning the ORM and SQL layer. It is key in any performance issue to identifiy where the bottle neck is – whether it is in the ORM or SQL layer. Naïve traversals of entity relationship at the ORM level is usually the culprit.
The process at the SQL level is iterative i.e. capture the SQL, analyze the problem SQL using database tools and then correcting the problem.
A few interesting tools were mentioned:
- p6spy (http://www.p6spy.com)
- Elvyx (http://www.elvyx.com)
- JDBCSpy (http://code.google.com/p/jordens-jdbcspy/)
One other interesting technology mentioned for Oracle is Streams Advanced Queueing. This provides support so that JMS and JDBC access can be done within the same local transaction and the configuration support is provided in the SpringSource Advanced Pack.
Here’s a picture of sunrise on the Atlantic Ocean. Having coffee on the balcony at 6:30am in the morning with the warm ocean breeze, watching the sunrise sigh…