Yesterday, my pair and I made a big refactoring in the codebase. You know how applications often have those “code/description” concepts? Like, there’s a code, “ON” meaning “Ontario” and we expand it into the description before we show it on screens and the like. Well, we made a major change in directions about how we were going to handle that.
Fortunately, we have unit test coverage. Not quite as much as I’d like, but we have it. Currently, there are around 900 unit tests. We made our change and ran the tests. We had about 30 failures. Around 20 of the failures had to do with one harder-than-average component of the change. We missed some of the implications of that change, but we corrected that. Another 2 were even more complicated implications of that same change. Took us longer to figure that out, but in the end we did. A handful more were minor little items.
Then there was one test failure that made no sense to us. It was tangentially related to our problem — the test related to sorting a list of descriptions, like province names — but the classes involved weren’t ones that we’d changed. We stared at the test, looked into the related classes, and scratched our heads.
After what seems like an embarassingly long period of time, we did a basic thing: we ran the one failing test in isolation. It passed. We re-ran the suite. It failed.
Aha! Classic symptom of a test that interacts badly with other tests. Once we had that figured out, we realized what had happened. There was a cache of all the descriptions, and our change in code/description expansion caused the cache to be loaded much earlier in the test suite than previously. The one failing test case was set up to load a few values into the cache from a mock source. When we were running the test suite, the cache would already be populated, and therefore wouldn’t go out to the mock source for values. Previously, the test had coincidentally worked in the test suite, rather than “properly” work in the test suite.
Our solution, then, was to flush the cache in the setUp() and tearDown() methods of the test, and all was great. I was a little unhappy that the test in question seemed to have been created with some bad assumptions about the state of the cache when the test was running.
There’s something about JUnit tests that helpfully narrow your concern to a specific problem. But in this case, it was important to have thought about how that test would operate in a larger context — to understand that other activities might have populated the cache before the test ran.
(Some of the standard JUnit UIs create different class loaders for each test so that tests always start with a freshly-loaded version of the class, but in my experience, this behaviour just leads to a world of pain. The JUnit plug-in to Eclipse doesn’t do that).
But here’s the thing I find myself wondering: is there a meaningful way of analyzing a test for “good assumptions”, other than just sitting down and reading all the code? Writing good tests is a skill, and some developers do it better than others. There are so many design principles that can be measured; how about good test design principles?