QCon San Francisco 2008 Report – Day 2. Scalability and Two Minutes of Video Fame

Lots of good sessions – many talks about scalability, DSL and REST, making it hard to choose (the attendee’s fear that ‘the other talk was better’). Unforeseen consequence: quite a few attendees self-obliged to make weak jokes about how unparallelized food serving was at times. Watch for a nation-wide revolution in food serving once people fly back.

Tim Bray’s opening session Application Design in the context of the shifting storage spectrum in retrospective, interesting to look at with caching in mind (I think a technical term frequency analysis would have ‘caching’ as #1, #2 and #3 in scalability presentations.).

Panel discussion about Architecting for Performance and Scalability. Most participants were vendor CTOs (Terracota’s Ari Zilka is a good talker) with Brian Goetz brought in just minutes before. It started with Bruce Eckel posing questions to Brian, there was good conversation afterwards about data partitioning to allow scaling. Wish the panel was longer, an hour is too short and many interesting conversations don’t get a chance to develop. There is something about really high scale & scalability that picques the interest of the audience, perhaps because pushing the technical limits brings back computer science into the less glamorous business software.

Straight after, Brian Goetz’ got hard-core with the fork-join concurrency framework planned for JDK7. The reason awaits us at the horizon: multi-cores with lots of processors. Intel’s CPU speed hasn’t increased since 2003, what has kept Moore’s Law valid is the proliferation of cores. It is still in a mild phase (we enjoy 4- and 8-way machines) and as long as their number is lower than the size of an app server’s thread pool, spreading the load at that level is still fine. With the rumoured 256-core CPU to be made by Intel by 2010, the game changes.

Making efficient use of those processors requires parallelization of finer granularity, inside the processing of a single request. As Mr Goetz put it, “many programmers will become concurrent programmers, perhaps reluctantly”. Machines with hundreds or thousands of processors have been out there for many years, it’s just that they will become mainstream. The good thing is that techniques have already been developed.

Although such techniques can already be implemented with the concurrency package included in JDK5, it requires a fair bit of boilerplate code. Ideally, higher level abstractions would make most of that go away – the join/fork, framework planned for JDK7 being one of them. Mr Goetz went on to show how it can be used for the immediate candidates of parallelization inside a request: searching and sorting; the work-stealing algorithm is just clever. Interestingly, the ParallelArray class from this framework is used in Scala to implement its Actors, for a great read see this comparison between Erlang and Scala.

Next, a sampling of Erlang & Yaws (famous for this graph) to build REST services and a talk on Domain Specific Languages in Groovy. Scott Davis is a very engaging speaker, unfortunately half into the presentation he was still introducing the language itself instead of focusing on the topic so I got out and went to check the interview being done with Joseph Yoder on Adaptive Object Modelling. The door to the room seemed stuck, I tried harder and ended up in the view of a video camera filming the interviewer. Oh well. Mr Yoder was talking about how to architect for changing business rules by making a more generic model. At the end the interviewer unexpectedly turned and asked if I had any questions myself, I had so there go my two minutes of fame. Thirteen remaining…

Back on the scalability track: Emmanuel Bernard (JBoss) & Max Ross (Google) had a good talk on scaling Hibernate. The beginning covered strategies to partition data across databases (with or without a security separation). Max Ross is the main guy behind Hibernate Shards so part of the talk was about it and what remains to be implemented before it’s declared GA.

The other main … shard was about Hibernate Search as a better way to implement free text searches than the classic SQL queries with %LIKE%. Having worked with a similar solution in the past (i.e. persistence layer + Lucene), it’s good to see it integrated with Hibernate. Main advantages: Lucene can handle word variations, has built-in relevancy ranking and, the reason it’s included in a scalability presentation, using it relieves the load of the text matching from the database to the app server, where it’s easier to scale. Several slides covered the synchronous vs the asynchronous updates to the Lucene index; Emmanuel also responded to a question from the audience that Lucene/Hibernate Search is more flexible and cheaper then the full-text search capabilities of some databases – and again, the point is to relieve the database server of this work.

Unfortunately the presentation on GridGain was at the same time (cool stuff – I’ve been thinking for a while to try its support for distributed JUnit execution. By the way, one of Kent Beck’s latest projects is JUnitMax, intended to reduce the time (apart from his passing remark in a presentation, the only reference I found was here. Now that tests are easy to write (the main goal of JUnit), the focus is shifting towards speeding up the execution.


It's only fair to share...
Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Leave a Reply