jump to navigation

Interpreting the “Doneness” of Software and the Role of Code Reviews August 15, 2008

Posted by Allen Manning in : Process, Refactoring, Unit Testing , 2comments

Apologies in advance if this post heads too far into the areas of philosophy. That being said, I feel it is directly relevant to the work that we do. I have found that throughout my career a particular question is asked, in many different guises, in almost every project that I have worked on: When is Software done?

I want to give a few examples of how this question can mask itself in different ways. It may manifest itself as:

“When am I done refactoring this object?”

“How many bugs are acceptable to have in the product before we let customers use it?”

“How many Unit Tests do I need to write?”

“Is it done, without this enhancement to usability?”

“Is the software done if there is no automated testing?”

“Can this software really be considered done if the source is formatted like that?”

“The customer likes the demo; they think it is done, so it must be done, right?”

We can re-phrase these questions in terms of the quality of the software: How do you judge the quality of software? Doing so, I believe has gotten us into trouble overall and this is the main argument of this post. I think that there is a tendency to try and think that there are mostly objective criteria for assessing the quality or “Doneness” of software. In doing so, I think we miss the sociological dimensions of our work and the qualities of the developers who are playing the software game.

Objective vs. Subjective

By objective, we mean it can be observed by all based upon some shared rules or laws. By subjective, we mean that there is some interpretation required by at least one individual. Objective criteria is desirable in many ways because we can apply it to all software, and we it can serve as standard of comparison. There is a feeling that subjective criteria are just too wishy-washy as many people can interpret things in many different ways.

I think that we want most of the criteria for Software being Done, to be just that, done with a capital ‘D’- a proper name, shared by all, objective. But in fact there are very few truly objective criteria that can be applied and we in really playing a game of shared interpretation based upon the context of the project and the community that supports the development effort. I’d like to support this by strolling through some examples of different ‘Done’ criteria.

Bug Count

Almost all of the criteria that I have worked with exist on a particular continuum. For example, if an application had so many bugs that it made it impossible for a user to complete any use case, then it is clear that the software isn’t Done. At that point, it really can only be considered under construction and shouldn’t be accepted by anyone as being Done.

On the other side, you can have software that has no defects that can impede use of its main use case, but a long list of enhancement, improvements to design which can make the primary use cases easier to use, perform better, and be more enjoyable overall. Looking through a list of these suggested improvements it becomes more difficult for anyone to objectively say that the software isn’t Done.

So we find ourselves, with our projects, and the software we use, somewhere in the middle. Bug counts are objective, but assessing how many are acceptable along with the subjective work of cataloging bugs vs. enhancements make bugs or defect assessment a subjective or interpretive affair. No help here.

Passes A Customer Acceptance Test

It is a necessary but not sufficient condition that the business owner of the software project say that it is Done. It’s a difficult argument any other way. If they aren’t happy and feel that it is fundamentally lacking in quality, well, that interpretation matters the most - they are funding the project; it is the air that the project breathes.

This is assuming a commercial context for the project. I say it isn’t sufficient, because the client may not have as deep an understanding of the underlying aspects of quality to support the system over the long-term.

For example, a developer could build an application that demos very well over a few use cases, but there could be real quality problems with the code itself to make maintainability a nightmare. The code could be in need of some major refactoring with loads of duplication everywhere; it could be difficult to read by not following any standard style conventions, etc.

The customer may not know the technical details of lack of quality, but the next developer who comes along to support it will. The extra costs of maintenance will eventually be brought to the attention of the business owner. The business owner may know, eventually, that it actually wasn’t Done in the first place.

Therefore we can’t trust the business owner to know truly Done software just by using it.

Is Refactored

There is no objective criterion for software. which can tell us if it has been refactored properly or not. There are many automated systems that can provide information on the level of duplication, etc. for source code but nothing that provides enough semantic contexts to say that this software, in this project has been properly refactored.

Coming back to our continuum, on one end you have software that could not be refactored at all: duplication has run rampant, confusing overly nested type hierarchies abound, huge nested switch statements everywhere, and no routine being under 1000 lines. We would probably all agree that this code could do with some refactoring.

On the other end it is more difficult to come to an objective consensus as to when code has been refactored enough, or when we are refactoring too early. This is a difficult trade-off that takes lots of experience to get right, and sometimes they are mere predictions of what new features have yet to come.

Follows a style guide

This is one of the few criteria that I think can be objective. A well published coding standards and style guide and be assessed across all software. Either the code uses tabs or it doesn’t. Either a tab is four spaces or it isn’t. Constants are labeled in all caps or they are not. This is one criterion that can be successfully applied objectively to all source in a particular project.

As a criteria, it clearly isn’t sufficient on its own.

Code Coverage

Code coverage tools can assess objectively how much automated testing coverage there is for a particular project. The question is, how much is enough?

Back to our continuum, on the one end you have a very large enterprise project with 0% code coverage. Could we really consider this project Done? Under what context would the manual testing required to enhance the project in the future be considered acceptable?

At what point is it Done? 100% for sure, but what about 70%? We find ourselves on a slippery slope, we can give a clear objective data on the level of coverage, but we can’t give a clear definition of the amount of coverage needed for it to be Done.

Context and the importance of Code Reviews

We have explored a few common criteria in the definition of Done. All of them shed some light on the quality and completeness of a particular software project and all of them are quite subjective. I think we will find a similar sort of subjective continuum for most other criteria.

So where does this leave us? Well, it leaves us with no hope of an objective definition of Done. In the same way as Thomas Kuhn argues in The Structure of Scientific Revolutions that instead of studying science as objective Truth we should study the sociology of scientists because they essentially create the current truth.

In other words, the community interprets for each given context whether or not the Software is actually done (notice the lower case ‘d’ we are moving away from our absolutes). This makes it a bit more challenging to come up with a good universal definition across multiple teams and projects.

But without solving the problem, I’d like to finish by offering some help: Code Reviews.

I don’t think that we should be policing each other’s code; rather we should ask our colleagues to review it in a way not dissimilar to a tax accountant reviewing our taxes. We should think of our colleagues as consultants that we have hired to keep us honest and true to form community form.

They may, and probably should in many cases, ask us questions like, “Do you have a unit test for this?” And we may answer, “No.” But if that question gets asked again and again we can start seeing that in this community our work isn’t quite done.

Likewise, if comments about coding style keep coming up, or suggestions for refactoring we can see that our colleagues feels that this could use a bit more polish. These very same reviewers may also be the ones that will fix our bugs when we are on vacation.

We may interpret things differently and argue the reasons for why we thoughtfully chose a particular path - and this is how the whole flow of communication happens.

Due to the subjective nature of software completeness and quality criteria, I advocate for looser check list definition of done with ample subjective criteria applied between responsible and experienced colleagues.