A Strategy for Building Stable Applications
Early in my career I had the good fortune to work under a management team that valued producing high quality applications - ones that rarely, if ever, crash under continual load. The director of the department believed, not only in words but in action, that a defect discovered in the field is significantly more costly to fix than one discovered in development. We lived by that rule, and we all believed in it.
For this product, we did not have a sustaining engineering team to fix and deliver patches to the field because it was simply not needed. Over the course of the seven years of working on this team through multiple product deliveries, I recall only two or three issues from the field that required an engineering fix. Now think of all the features you could deliver by investing the money saved in a sustaining engineering team into new feature development. That’s what we did.
I’d like to describe the six practices required to produce and release stable applications. The level of stability that you achieve in your products will correlate to the degree to which you commit to these practices. The strategy by itself isn’t enough to achieve great results. Great results require discipline, persistence, and a strong will, and most importantly, it requires a leader and a team that values quality above all other criteria.
Quality is Everyone’s Responsibility
While the quality assurance team has quality in its name, quality is everyone’s responsibility. On this team we didn’t have a quality engineering team. Developers wrote and executed the performance tests, regression tests, integration tests, and acceptance test plans. While there was no mistaking that quality was everyone’s responsibility on this team, it isn’t necessary to have the developers perform double duty as the quality team, but they do need to be involved in the test activities.
Unit testing is something that must stay with the developers, and it should not be shortchanged to secure a delivery. The developers should have responsibility in reviewing the test designs produced by the quality team, and they should be involved in designing and executing the performance tests. If the need arises, developers should be available to pitch in to support the test execution effort. Developers always should be available and open to questions from the testers. If the engineering culture is that developers only write code, the team isn’t prepared to produce higher levels of quality.
Practice Defensive Programming
This approach was popular when I started working in the field, but it seems to have lost its luster. I guess because it appears to take more time. Defensive programming is about writing code to proactively detect error conditions and respond to them in sensible ways to prevent the program from failing or to have the program fail gracefully. There are two primary benefits to defensive programming: the developers become more focused on preventing errors, and when errors do surface, they can be solved quickly as the error is detected closer to where the error materialized.
Instrument the code
Have a debug version of your code with instrumentation to facilitate debugging when error conditions are detected. This improves the turnaround time for fixing defects when they do occur. One of my favorite practices is liberal use of the ASSERT Macro in C. This practice was made popular in the 90’s by Steve Maguire in his book titled, “Writing Solid Code.” Use the ASSERT macro to validate parameters passed into procedures and functions, and use it to assert the conditions upon exiting a procedure or function. If you are detecting error conditions at the earliest instance in which they materialize, you will dramatically improve your team’s ability to find the root cause quickly.
ASSERTS differ from defensive programming in two ways: they are only present in debug versions of the application, and they do not take corrective action; instead, they break into the debugger when the error condition is detected.
Do not limit instrumentation to ASSERT macros only. Some other good practices are to have execution tracing logic, initializing memory when allocated, and object validation logic to name a few others.
Soak Test the Application
Soak testing is the key practice. Soak testing is the practice of exercising the application under automated user control. The goal of the automated testing is to put the application under continuous user load to surface problems that would only surface after repeated usage. It is best if the user input is randomly generated, but it isn’t required to have good results.
If the application processes input other than user input, it is necessary to have simulators to generate that input and to pump that into the application continuously while the application is under automated script control. At least every morning the applications should be surveyed for error conditions. These are some of the candidate error conditions to monitor:
- Memory Leaks - is application memory usage continually growing?
- Secondary storage leaks - is the application consumption of secondary storage continually growing?
- File handle leaks - are the open file handles continually growing?
- Lock ups - is the application still responding to user input or does it appear to be locked/hung?
- Crashes - has the application committed an operating system protection violation?
If any of these conditions are detected, they should be analyzed and fixed immediately. Keeping the application running continuously under automated script control has precedence over all other issues. Once the issue is fixed, a new build is released to the lab to undergo soak testing.
Performance Test to Failure
Many teams will execute performance tests to validate that the performance objectives of the system have been achieved and stop there, but it’s not enough to release a stable application. There is really no telling how your customers will finally use your application. Performance test the application beyond the performance requirements until it fails. You may be surprised to find that the application can exceed the performance objectives significantly by addressing some minor issues revealed when the application is stressed beyond the requirements.
The final objective is that when the user exceeds the performance specification, the system handles the condition gracefully. It should not crash, and it should inform the user that performance has degraded, and that calculations are unreliable while it is operating in this overload state. When the load settles within the bounds of the specification, the application should recover and begin to perform normally.
Have Stability Release Criteria
The release decision has to be objective, has to be reflective of the test results, and has to support the goal of releasing a stable product. Here are some recommended stability release criteria.
- Identify the length of time for the application to run uninterrupted under soak testing. I like to set the target at a multiple of the expected user usage. For example, if I’m testing a professional trading application, the work week is 5 business days or one week. At a minimum, the application should run uninterrupted for 5 straight days, but I would try to have that application successfully run for 30 days or one month under soak testing. The release build should have at least achieved the minimum criteria.
- Have no known resource leaks in the application at release time: dynamic memory, secondary storage, file handles, or any other limited resource used by the application.
- Have no known crashes at release time.
Summary
It is my belief that releasing a high quality product is the easiest way to differentiate your product, and low or sub par quality is the surest way to lose a customer forever. It’s difficult to unglue a costumer from a reliable product even if it does not have all the bells and whistles of competitive products, but of course, it must have the essential features to be desirable. We only have to look at the American auto industry for a great example of how low quality can destroy a once great American industry.
I recall a number of years back when I began to develop an interest in video editing on the PC. After evaluating a number of products, I settled on Adobe Premiere. While the quality of Adobe Premier has improved over time, the initial version that I worked with could not complete the job that it was advertised to do, and it was true for all the competitive products that I evaluated at the time. I only settled on Adobe Premiere because it came bundled with the Matrox video capture card that I purchased: RT X10.
My video editing needs aren’t demanding. I didn’t need many bells and whistles. All I needed was the ability to capture, to splice multiple clips and stills, to overlay audio, to edit some titles, and to insert transitions. As an amateur editor, it couldn’t reliably satisfy my limited demands. If I could have found a less feature laden product with higher quality at a premium price, I would have purchased it without hesitation. I have no doubt if there was a competitor offering a competitive product with a reputation for high quality, the product would have been able to acquire significant market share.
Quality is a differentiator. Grab the low hanging fruit; put these practices to use in your own projects, build stable applications, and you will differentiate your products from the competition.

March 24th, 2008 at12:30 pm
Thanks for an excellent article! I’m do testing of logic ASICS before they go to production. Much of this applies!
March 24th, 2008 at12:39 pm
Excellent post.
March 25th, 2008 at1:22 pm
I will try the “Soak Test”
thanks
March 25th, 2008 at4:07 pm
Engin,
Good luck, and let us know how it went. Also, be sure to have good release criteria; otherwise, you’ll find a lot of stuff and so will your customers.
May 31st, 2008 at5:37 am
thank you for your thoughts,