Challenging the Myths of Myths of Lines of Code
When I was a child, I had a strong aversion to pineapples. Just the look of them made me ill. It didn’t matter whether it was cut or uncut; there was just something about the look that made me believe I would not like the taste. Maybe it was the color yellow, but there was something terribly unappealing about the fruit, and no matter how much my mom would entice me with declarations of how sweet it taste, I would not try it.
I felt the same way about cranberry sauce too. It was an emotional aversion; there was nothing logical about it even though I was convinced my reasons were all logical. Cranberry sauce looks slimy; slimy is disgusting; therefore, it must taste like it looks: disgusting. It’s a logical inference; though, it has little relevance to how cranberry sauce actually tastes.
As I got older, I became more open to giving foods a try (and other things, of course) that were unappealing to me. When I finally gave pineapples and cranberry sauce a try, I discovered how I’d been missing out for so long on enjoying a food that was so pleasurable to me.
Much of the software community has a similar aversion to LOC. Many of their arguments against LOC are logical, but they aren’t relevant to the science and practice of LOC as advocated by its adherents. Sure one can write a line of code with more defects than ten lines of code, but the Law of Big Numbers says the density observed in practice will be the expected value.
Scenarios to Consider
Let’s put aside defect densities and individual productivity rates. Instead, I’d like for you to consider some hypothetical but possible relationships to evaluate whether LOC would be a valuable piece of information when managing your project. For each of the scenarios described, all references to lines of code are for lines of code that the team writes themselves: they are not from a library, a wizard, or copied from another source. Also assume, the team has integrity, and they aren’t looking to sabotage the measurements.
- If you are in the middle of your software development cycle where you are releasing daily builds to the test labs and there is still significant functionality to complete, would it be interesting and useful to know that the size in LOC is flat lining or growing insignificantly for the past 3 weeks?
- If you are managing a project and the growth of the code base is growing at an averaging of 100 LOC/week with 20 software developers writing code for this project, would you feel comfortable that the team is producing well at 1 LOC per day per developer of “C” code? If this team is off shore, would it be even more interesting to know?
- If you are interviewing for a job and the team was supporting applications totaling 50 million LOC of custom “C” code with 10 developers, would that tell you something interesting about their capacity to effectively support their customer base? Would it tell you something interesting about your learning curve?
- If you manage a sustaining engineering team and the size of the applications that you are supporting is growing by 1 million LOC per year, would that tell you something about your staffing requirements to support your customers effectively?
- If you are managing the release of a product, would you feel comfortable about the quality if your code base grew by 20,000 new LOC in the week that you plan to release?
- If you are managing a release of a product to be delivered in five days, would you feel comfortable about the quality that the software will release with if your code size was flat lined for the last 3 weeks, no defects were uncovered in the last 3 weeks, and all the functionality was verified to be completed?
- If you are comparing two completed applications with application A having a size of 50,000 LOC and application B having a size of 500,000, would you expect the number of engineers working on team B to be larger than the number working on team A? What if application B is 2 million LOC in size?
- If you are making decisions on source code to review because you don’t have time to review everything, would you choose to review the procedures with more lines a code over the procedures with less lines of code if that is the only input available to you?
- If you are comparing two applications with application A having a size of 20,000 LOC and application B having a size of 200,000 LOC, would you expect the complexity of application B to be more complex than application A?
- If you estimated that you would grow your code base by 50,000 LOC (assuming you had a way to estimate this accurately) for a release and you are 20 weeks into a 24 week duration schedule, would you feel comfortable that you are on schedule if the code base only grew by 10,000 LOC? Would it tell you something interesting about the schedule if you had measured 60,000 LOC at week 20?
Insight Gained
I’m sure it’s unapparent to some, maybe even to many, that knowing something about the size of an application has value. LOC gives insight to the following aspects of the product:
- The complexity of the application - Size is one measure of software complexity. Software complexity increases as size increases. One person can easily understand the implementation details of a 10,000 LOC application. One person would be challenged to understand the implementation details of a 2 million LOC application. At a minimum, it would require more time to learn.
- The pace of development - All completed software projects deliver LOC. After all, requirements are finally manifested in completed LOC. If zero LOC is being produced on a weekly basis then zero requirements are being implemented. If ten LOC is being produced on a weekly basis then progress is being made towards the completion of the project. Further if 100 LOC is being produced on a weekly basis then more progress is being made towards the completion of the project.
- The support requirements - The number of features in an application correlates with the number of LOC in the application. The number of defects in an application also correlates well with the number of LOC in application. Every line of code written is one more opportunity for a defect to be introduced. It requires a larger support team to support more features and more defects.
- The staffing requirements - Larger software deliveries require larger teams to deliver them timely.
- The quality of the application - As noted earlier, every line of code represents one more opportunity to introduce a defect. More lines of code have more defects. The Law of Big Numbers tells us that defect densities into test will be found having a density of the expected value.
- The likelihood the team will deliver to schedule - If there is more quantifiable work to deliver than there is time left in the schedule to deliver it, the chances that you can deliver to schedule is unlikely unless something changes. Now some will naively believe that the initial output estimate needs to be accurate to provide value, but that would be incorrect. Of course, accuracy is better, but project tracking is about testing the estimating assumptions and taking corrective action when the estimating assumptions are wrong.
- The readiness for release - High quality releases are made when the code size stabilizes a number of weeks before the release date. If one of your objectives is to release with high quality, you will want to observe that the growth in LOC has attenuated and flattened for a number of weeks before the release date. It takes time to find and fix defects. New code always introduces defects into the application.
Summary
Having insight to the measure of LOC permits the manager and the team to ask good questions, questions that you wouldn’t know to ask without the information. You can ask questions like, how can we expect to release this product on Friday when we just added 20,000 LOC in the past five days? Where did they come from? Why are they there? Are they needed? Something similar to this happened to me once on a project. There was a good reason for it, although unexpected, and finding this out when I did allowed corrective actions to be taken and still deliver to schedule.
LOC isn’t a perfect measure, but it is a key measure as software development is essentially the process of producing LOC to achieve an objective, but it’s not the only software measure to use. For some relationships, I’ve found the measure to be extremely precise, and for others less precise, but nevertheless still valuable. It’s the starting point for asking good questions and drawing good inferences about quality and progress to schedule.
It’s likely that most, if not all of you reading this, are still skeptical of the benefits with measuring LOC. I was skeptical too when I first gave this a try by putting this into practice for a CMM process improvement initiative at one of my employers. The unexpected happened when I began measuring. Reliable patterns began to emerge. Patterns that helped me gain insight to the pace of development, the level of quality, and the status of the schedule. If more practitioners in the industry begin to use this measure, I believe we can come to understand the patterns and relationships better and develop even better techniques for measuring our projects, but this promise can only happen if we start somewhere. From my experience I believe that measuring LOC is the best place to start.
