The art of building software: October 2010

Wednesday, October 27, 2010

Sustainable Software Engineering

Something bad happens to many software development teams somewhere around the 4th or 5th year of life of their code base: Quality and productivity take a nose dive, following an ever increasing trend of problems.  I call this the "Developer Duldrums" (with apologies to Norton Juster) because it's a sad place to work.  It's insidious because the team is usually building software the same way it always has, yet they've lost their mojo.  Why?  The answer is that the code base can no longer sustain development.  Enter sustainable software engineering practices.

Sustainable software engineering consists of:
  1. Automated functional testing.  I like to bias most of my QA team to writing automated functional tests.
  2. True unit tests (not functional tests masquerading as unit tests)
  3. Keeping your code bug free right from the beginning
  4. Relentless refactoring
  5. A solid and always-evolving architecture that is well internalized by the team
  6. Properly componentizing (with tests on the interfaces) and isolation of subsystems in your architecture so that you can re-write sub-components when needed without touching other parts of your code base
  7. Little overtime
Sustainable software engineering is usually not practiced because there is a cost associated with it, a cost which has a return that may be a year or more down the road.  However, if you haven't paid your dues and you get yourself a few years down the road, the chickens will come home to roost.  At that point, you're looking at rewriting your software - that's something you want to avoid because of the large cost and risks involved.

Counterpoint: In my experience, I've observed that technology trends move fast enough that approximately every five years, there may well be sufficient benefit to re-architect your code on top of newer technologies to justify a return on the investment required to do so.

So it may happen by luck that the Developer Duldrums curve may drop off close to the time you'd want to rewrite your product anyway because of advances in technology, so sometimes that's your way out of the Duldrums.  You kill two birds with one stone.  However, I wouldn't want to rely on such a coincidence happening.

Shorter projects benefit less from sustainable engineering, simply because you don't make your return on the investment because you don't have software you need to sustain.  So the amount of sustainable engineering you practice should be proportional to the project length or expected life time of the code base.

Sustainable software engineering isn't always possible to practice.  I would guess that there is approximately a 25% cost overhead for doing so.  This is due to the time it takes to write quality unit tests, refactor your code, and fix the important bugs as they occur.  (Counterpoint: The use of automated unit test generator tools such as Agitator from the beginning of the product development life cycle may reduce this tax.)  Often times, your primary objective is to reach your customers as quickly as possible to validate your market and product.  To do so, you could get to market faster by skimping on the sustainable engineering stuff.  That's certainly a valid argument.  But if you go down that route, you should be prepared for the chickens to come home to roost at some point.

Since I wrote this blog posting, I discovered the book Sustainable Software Development, by Kevin Tate.  I asked Kevin what he thought of my blog posting.  In his reply, which you can read below, he mentions something really interesting about proper componentization (which resulted in my point #6 above - thanks Kevin!):

I like the post and agree with the main points.  One thought is that you mention a 25% overhead, but in my experience people need to recognize that you typically get that 25% back (and more) through the knock-on effects of, for example, having reduced QA / manual testing and decreased effort behind release cycles.  
Another thought I had is that you touched on the need to periodically rewrite based on the latest technology.  That's where you might want to add one other "critical element" to your list: the need to componentize (with tests on the interfaces).  I talk about it indirectly in my book, but since I've written the book it's become increasingly obvious to me that this is another element of "secret sauce" because it allows teams to selectively rewrite their product as they go without having to throw out the entire system.  The rewrite from scratch scenario doesn't succeed very often, if indeed you even get the business support to do it!

Building high quality software fast and cheap

James Currier at Ooga Labs posted a blog entry that starts as follows:

"I’ve heard people tell me “We can build product fast, good, or cheap.  You can’t have all three.  Pick two.”  I believe this is a corrosive mindset, used by bureaucrats to justify mediocrity, or used by people who are afraid of failure to set the bar low enough so they feel comfortable in their daily lives."
I disagree with James and think you can only lock down two on a large, complex software development project.  While it's possible to build good software fast and cheap, I think that happens only under certain circumstances where the engineering risk is inherently low.  Also, sometimes you get lucky and nail it - I've seen that happens a few times. Projects having any of these characteristics generally can be delivered fast, good, and cheap:

1.  Young code base (generally a few months of development or less)
2.  Small team size (generally 1 or 2 senior developers, sometimes up to 3 or 4 if things are going well)
3.  Relatively simple feature set

As you get to bigger teams with more complex software that takes longer to create, your risk goes up.

The underlying principle at work here is "shit happens" - sometimes an engineer quits or gets sick, or you discover a fatal flaw in a library you're relying on and you need to retool, or you discover that a feature doesn't work as designed and you need to redesign it, or the market changes and you have to react.  When such a thing happens, the bottom line is that it's usually going to take you more time to do what you wanted to do.

So let's say such a thing happens AND you hold all three values fixed.  You can deal with such unexpected events by building in a "slop factor" to your schedule, so you eat into your slop time.  But essentially what a slop factor does is pull in the release date by saying "we expect this much shit could happen".  So you really are slipping in a way so the "fast" part is actually varying.  Or you could ask your team for overtime - that's one option.  But that's really tweaking the "cheap" part because you are putting more time in - if you paid for that hourly, your cheap just went up.  You could shave your feature set and simplify things and still ship, but then you're tweaking the "good" part.

Sometimes bigger shit happens than you budgeted for with your slop factor- we all get constipated from time to time - and if that happens, then what?  It's naive to believe that big shit won't ever happen.  Then what, if you hold all three fixed?

I call these three values by different names that more closely connote the values being managed:

1.  Release date
2.  Features & Quality
3.  Resources (people and hardware/software)

My basic form of project risk management is to look at these three values as dials that you can twiddle or tweak at the beginning of each sprint.  So I've folded this form of risk management into the way I practice Agile project management, by constantly reassessing the feature set, team velocity and composition, how the product is shaping up, and the importance of the release date relative to other business priorities, which may be more fluid than we'd like.