QLMSourcing – Performance and Load

As promised I will elaborate on some of the technical challenges that we were presented with while loading the QLMSourcing DataStore.

The QLMSourcing Server

The server that we used for load testing was a dual XEON 2.8Ghz machine with 3Gb of RAM. The application was deployed to a single instance of JBoss with 1.5Gb of RAM allocated. While the RFQ base grew we added auxiliary machines to handle the memory and load requirements.

Loading the DataStore

To load the data we used the integration test framework that was built for the application. This framework consists of Java classes called “Page Harnesses” which interface with individual pages in the application. These java classes provide methods which perform the actions and scrape data from web pages.

The load tests are java based applications that take advantage of the page harnesses built for QLMSourcing. They follow scripts and walk the pages of the application creating RFQs. For our testing purposes, we use 3 separate physical machines which run 10 threads (users) each to create the RFQs. On average we can create and issue around 150 RFQs per hour or 3600 per day.

Tweeks Along the Road to 100K

One thing about load tests is that before you start to load your system you have a list of things that you suspect will break. My experience tells me that this list is generally correct but never complete and they never break when you expect them.

We used JProbe initially to analyze the memory consumption. To get more fine grained statistics per actual object, we eventually added our own DataStore statistics page. To gather performance numbers we have a Spring interceptor that tracks service calls, times them, and records them with the transaction. Any performance bottlenecks become obvious quickly by looking at a transaction XML.

Well here is our time-line from 0 to 100K of RFQs with our ?breaks? and our fixes.

# of RFQs Problem Resolution
2000 Noticed memory consumption was excessively high Looked at all objects being stored in the DataStore for redundant information. Some objects were trimmed back in terms of ?audit? information i.e. create and update user ids and timestamps. Removed duplicate information from Vendor Quotes.
2500 Performance of RFQ specific pages slow Added indexing to DataStore. This allowed RFQ specific information to be accessed quickly without ?full table scans?
3000 Needed more analysis tools for # of entities and memory consumption Added DataStore Statistics page which displayed current indexes, and number of instances. Added ?Serialized Size? for each stored object which is the total serialized size. This allows us to see which objects consume the most memory.
3000 Perm Gen Memory too High The Perm Gen memory requirements seemed excessive and required some investigation. We were allocating 256Mb and that was not enough. We discovered that String.intern() was being called for every String read in from XStream (XML marshaller). We decided that only Strings <= 20 bytes should be interned as these would be ids and other common Strings. Also noticed that XStream remembered node information well past when it was required. We corrected these issues be extending the XStream classes.
3500 Noticed Job Queue memory consumption too high With the added ?Serialized Size? statistic we noticed that a small number of Mail Message Jobs consumed a large percentage of the memory. This was due to the size of the text and html messages that were stored. We added a new object class called ?StringCompressedField? which is essentially a String that is compressed. This new field was now used in place of any potentially large Strings in our objects.
3500 Number of Jobs in Queue too high We added a special automatically scheduled job called ?DataStoreCleanupJob? which scanned the Job Queue and removed jobs that were old. We actually could configure the maximum age or maximum number to be retained in the queue.
4500 Snapshot of DataStore too slow and is blocking Added online snapshot capability. This was achieved by adding in another transaction level. When the snapshot is taking place, the primary DataStore Map is locked. A global transaction is setup that records all subsequent transaction commits. The transaction files are written as per normal but the ?delta? from the transaction is recorded in the global transaction. When a client asks for a entity, the primary DataStore Map is read, the global transaction deltas are applied and then the current transaction deltas before the object is returned to the client service. This slows down the DataStore but only slightly.
19000 Out of Heap We ran out of Heap memory in our single JVM. We started a Horizontal Proof of Concept project to see if we could distribute the Data across several JVMs and physical machines. This required changes to the Snapshot files and the online Snapshot mechanisms. This project took several weeks to complete. At the end of the Proof of Concept we actually had a distributed DataStore that was configured through an DataStore Algorithm XML document specific to an installation.
22000 Could not properly manage the distributed DataStores Added the ability to start the application without the full DataStore loaded. This allowed the application to come up quickly and give administrators information about the current status of the DataStore load. Access to the system is restricted and not data can change until the DataStore is fully initialized. To lock down the system we use Spring interceptors.
25000 Inbox Scans too Slow Added the ability to distribute the work of scanning the RFQs to each individual distributed DataStore. This allows us to send the request out to all the distributed DataStores simultaneously and aggregate the information back for the client. This logic is completely hidden from the DataStore clients.
40000 Startup of application starves the main JBoss server of resources making the application too slow while system is loading Change Distribution Algorithm to give no distributed data to the ?main? DataStore. This allows the application to come up quickly and forces the main data loads to be in separate JVMs.
45000 Inbox Slow Refactored the distributed RFQ filtering mechanism making it more efficient.
60000 Could not see how each distributed DataStore JVM or physical machine was performing Added a performance statistic to the DataStore statistics page to give the average ms for the last 5000 DataStore calls per distributed DataStore. This allowed us to see which machines were overloaded. Adjusted the machines and the distribution.
100000 Made it! Joy in the streets.

My Performance Advice To You

Make sure you have analysis tools available while you are in development. Start your load early giving you time to adjust your architecture during development. Don?t wait until the end of your project. Do not be afraid to build in custom analysis tools to help you do your job better. A system built using the Spring framework makes performance tracking extremely easy.< p>

It's only fair to share...
Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Leave a Reply