The art of building software: March 2010

Tuesday, March 23, 2010

Unit Testing

I've noticed that the phrase "unit testing" means two pretty different things.  It can mean:
  1. A developer tests their own code as best they can by itself.  Others might call this "white box testing" or even "white box system testing", but my current client calls this unit testing.  Since the testing is generally not automated, there are no automated test results.  Test results must be typed in by hand, if indeed test results are even reported with this method (they are usually not).
  2. A developer writes automated unit test code, typically using a unit test framework and using mocks as needed, to test their interface and achieve maximum code coverage, with objective pass/fail results.
I'm currently working with a client that uses the definition in #1, and I've had to broaden my definition of unit testing to incorporate it.  There is so sense asking my client to use what I think is the correct definition.


I'm a great fan of the second kind of unit testing.  It gives me the confidence I need to refactor relentlessly, constantly throwing out old code and improving what's there.  That's one of the secrets to keeping productivity up over the long run.

But sometimes there is no long run.  Sometimes the projects are six week tactical exercises, after which the code is done.  In those cases, unit tests would be a waste of time.

Sometimes it was simply not possible to invest the extra time to write unit tests, because there is an initial investment in time with a payback period, and sometimes you can't afford the investment of time.

But some times those six week tactical exercises get cloned, or turned into a six month exercise.  But you never wrote unit tests from the beginning.  And the thing about unit tests is, that if your code base doesn't already have good unit test coverage, it can be quite time consuming to add after the fact.  So time consuming, that you won't do it, and you'll end up with a project that you wish had good unit test coverage, but it doesn't.

Often times, unit testing is introduced during a "re-architecting" phase, which generally seems to happen about once every five years.  That's a topic I may blog about some time - the re-architecting phase of a product life cycle.

Monday, March 22, 2010

Making a software shit knife

Attachment

When I use the word attachment here, the closest meaning comes from the Sanskrit word Upādāna, which is generally translated as attachment.  I have my own related view of attachment.  Although it is alleged to be one of the primary causes of human suffering in some Eastern traditions, in this context the only suffering that could come from attachment to software is making the wrong choice about a technology decision.  I guess you could suffer from that.

Making a shit knife

Geeks like staying abreast of the latest cool shit.  Usually a geek will start their career by hitching their wagon to some piece of cool technology shit that they develop some real depth with.  New pieces of hot shit come by, and the geek is happy to add those to the rest of his cool shit.

When a new problem comes up, the geek looks in his bag of cool shit and pulls out a piece and says "I've got just the cool shit for this."  And he applies his piece of cool shit to the problem and all is good.  In fact, sometimes a geek can get so good at fashioning anything out of his cool shit, he can create something like the infamous Inuit shit knife.  Now that was some serious shit.

The problem of attachment to software

Lucky that Inuit was in some cold climate, because otherwise he probably wouldn't have managed his escape.  And like him, sometimes I am lucky.  But sometimes I just don't have the right cool shit in my bag, so I'll find something that looks close, but it's not the right shit because the right shit isn't in my bag.  When I'm attached to a particular language, I'll choose that language for projects over other languages, even though another language might be a better business choice.  Sometimes I get attached to new technologies I haven't tried yet, because they just seem so damn cool, and then once I learn them I just want to use them everywhere.  Sometimes I've learned a particular pattern for how to solve a problem, and I'll just apply that pattern even though it's not the best choice.  Sometimes a bug will elude me because I'm attached to how I think the code is supposed to work.  In other words, I can let my own attachments interfere with making the best business decision.

Software attachment goes beyond what we have as software developers though.  We're attached to the latest version of our favorite programs, that cool screen saver, the cool web thingy at work, and on and on.  If we have to use something different, we're moving into new territory, the unknown - and that can be scary or stressful for some people.  It is an opportunity for others.

I'm still trying to figure out how to best address this problem in myself, by trying to identify when I am overly-attached to a particular technology to the point where it's getting in the way of making the best business decision.

Gut instincts

This is a story about how I knowingly voted with a group to make the wrong decision.

As a group, we were evaluating several alternative technology stacks for building a healthcare application.  We listed out all the qualities that were important to us in the stack - things like quality of development environment, availability of third party tools, and so on.  We gave each of these qualities a numerical ranking of importance.  Then we evaluated each technology stack for each quality we'd listed, and ranked how well that stack supported that particular quality.  We had a handy spreadsheet that calculated the total for each technology stack.  As a group, we all agreed to all the numbers on the spreadsheet.  The spreadsheet told us which stack won - and that's the one we went with.  Bam, done.  We could even explain to others why we'd chosen this stack.

Yet the spreadsheet gave us the wrong answer, and the group didn't figure that out until several months later.  My gut feeling at the time also told me we were picking the wrong technology stack.  I tried to explain why the choice was wrong, but because I couldn't express my gut feeling in this spreadsheet, my explanations were not effective.  I even tried manipulating the weights of each quality, and even added new qualities to the list to try to get my choice to come out as #1, but I couldn't do it in a credible way.  I think a few other team members were in the same situation too.  But we ignored our gut feelings and went with the #1 on the spreadsheet.

Several months later the group realized they'd picked the wrong horse and started over again with another one.  How is it we chose the wrong technology stack?  We consciously tried to pool our knowledge in this spreadsheet.

Maybe the problem was that this spreadsheet approach simply excludes gut feeling, right-brain, non-rational thinking from the process.  So we even added a row to the spreadsheet for "gut feeling".  Really, we did that.  That didn't help the spreadsheet come up with the right answer though.

Sometimes it's a mistake to over-formalize the decision making process of a group.

Should a group leader/manager/architect have veto power over whatever decision a group comes up with?  Would that have helped this group by making the right decision to begin with?

Should you always trust your gut instinct?  Probably not.  But you should seriously pay attention to it.

Thursday, March 18, 2010

Knowing when to fold

When I was at Oracle during the 1990's, I got to work on this huge framework called Sedona. After Oracle invested several years and hundreds of people working on it, I joined the project and was building the entire web front end for the tool (the web was brand new back then, remember).  Then,  I read one day in the press that the project I was working on had been canceled.  I left Oracle that day, because I felt like I should have heard that my project was canceled from inside Oracle before I read about it in the press.  I didn't disagree with the decision, but I did feel like the management culture at Oracle was not healthy for me spiritually at the time.

The architect for Sedona was a very bright guy named Chris, who went on to Microsoft to help create the COM+ distributed transaction framework, and then the wildly successful Common Language Runtime that's part of Microsoft's .NET framework.  I had lunch with him a few years ago when I was working there as a Principal Development Manager, and I'm really glad he found an outlet for his professional creativity.

Chris and I have never talked about this so I don't know what he'd say, but I think some of his Microsoft work bares the marks of "doing Sedona right the second time at Microsoft", and I really see his Oracle and Microsoft work as part of a continuum.

Oddly enough, in my current consulting work, I've had an opportunity to learn this enterprise computer system called Pivotal.  It's pretty close to what Oracle was trying to build with Sedona, and I wonder now whether there were any former Sedona people on the Pivotal team.

UPDATE 4/14/10: I talked with Martin Stiby (a really bright and very nice geek) at Pivotal and he told me that the design did not come from Oracle or Sedona, just to set the record straight.

I think one of the hallmarks of a great CEO is that they have a good track record of knowing when to fold and when to double down.  Sailing metaphors were really popular for a while in management consulting.  When is the wind going to shift a certain way?  How does the team on the boat adapt as a group to a constantly changing wind and then win a race?  Maybe they're not always right, but they're right enough to win.  Given Larry's track record, I would give him the benefit of the doubt on this one.  How did he make this decision?

I think knowing when to persevere, and when to call it quits is a key skill that is transferable to many parts of my life.  In software development, I see this in development, debugging, testing, management, user interface design, and more.  Often, such decisions are made by individuals and no one may even know that a decision was made. But also these decisions must be made by groups, and it's here where culture can play a critical role in helping the group make the best decision.  Whether it's the board or the executive team or your daily engineering stand up meeting, culture helps groups pool their collective knowledge and make wise decisions about when to fold.

My favorite bug

This is the story of my favorite bug. I created this bug on an early version of the NetFlix site, over a decade ago when they were a relatively small, unknown start up. Now I stream NetFlix on my XBox, and DVDs in the mail seems... so... last century.

Way back when NetFlix was a tiny start up in Scotts Valley, CA, I walked down to their offices from my house up the street, and walked out with a job as a senior web developer. They soon moved just over the hill to Los Gatos, CA, where I got to implement this fabulous bug I want to tell you about.

When you browse the NetFlix web site and look at a movie, you see all this information about the movie such as the description, actors, director, and so on. NetFlix used to get this data from the Internet Movie Database. I don't know if they still do.

Anyway, my task was to set up a process for automatically getting the latest content from IMDB and bringing that into the Oracle database so it could be displayed to the user on the site. My basic approach was a standard "Extract, Transform, and Load" or ETL process. I set up a job to retrieve the latest data from IMDB via FTP, then I wrote a program to do some validation and pre-processing on that data, and loaded it into a few new tables I'd created in the Oracle database. Finally, I kicked off a big PL/SQL script I wrote to process the newly inserted rows by updating the actual movie reviews, actors, and so on - where they really lived in the database.

When I ran a test of the PL/SQL script against a test copy of the production database, I noticed that it took several hours to run, during which time the database was so over-taxed that actual end user response times would have been unacceptable. So I came up with an idea to process one row and then sleep for a few seconds, then do another row and sleep, and so on. That way the content would still be imported, it's just that it would take a few days for the process to finish. And more importantly, database performance would remain acceptable during the process. Sounded good.

So this passed QA and was put into production. Over the next day, people gradually began seeing the new site content, one new title every few seconds, and they were pleased to see this fresh content. What they didn't realize until some time later, was that each time my PL/SQL script updated a movie, it set the available inventory level on that movie to 0. This effectively took it out of stock as far as the web site was concerned, so that movie was no longer able to be rented through the web site. Over time, the entire inventory was being taken off line, unavailable to be rented. That was their sole revenue stream, mind you.

At some point before the entire inventory was destroyed, we figured out what was going on and ultimately ended up restoring an Oracle database backup and deleting the PL/SQL script I'd written.

Over the next few days myself and others worked to understand the root cause of what happened. How could this have passed through QA? Well it turns out that NetFlix used Oracle Financials, and that was running on the same Oracle database server. Oracle Financials was not present in the QA test setup. Oracle Financials saw this movie content update as essentially meaning a new movie was in the inventory, so its available inventory starts off at zero until you tell it how many you've got. So Oracle Financials was taking the titles out of inventory.

I had no idea Oracle Financials was even in the picture, and I guess our QA team didn't either. The bug fix for this was really simple once we knew how to get Oracle Financials not to view this as a new title. And eventually the new content got out on the site and all was good.

Over the next few weeks we talked about how we could prevent something like that from happening in the future. I'll never forget this really bright programmer there named Kho telling me that really good programmers just don't write bugs to begin with. Then he proceeded to show me all this bug free software he'd written. Once every few years, I seem to somehow write a huge block of code, and it just compiles and runs, bug free. And I am amazed. It can happen. I don't know if it's just luck or whether this can be cultivated. Maybe Kho is right.

Tuesday, March 16, 2010

What this blog is about

This blog contains my insights about the art of building software. This is an all-inclusive look at everything it takes to build truly great software. Sure I'll go over languages, technologies, platforms, APIs, movements, battles and wars, winners and losers, and so on. But just as importantly, I'll be discussing how groups of people can organize themselves and create a software development culture that consistently produces award winning, best-in-class, highly profitable and successful software. This builds a sustainable business model because you create a barrier to entry and competitive advantage in the form of your own unique software development culture. This is very hard to steal, and it's also amazingly hard to emulate. And it shines right through your brand equity and directly touches your customers, and your bottom line.

I'll also be touching on important industry trends and governmental issues from a global perspective. Europe, Asia, Africa, Israel, Russia and Australia (just to name a few) are all important places to monitor. Search engine wars, and the massive cat and mouse games played by hackers, script kiddies, and software developers world wide.