Software doesn’t need maintenance

Although the title may sound provocative, software indeed doesn’t need “maintenance” in the traditional sense. Unlike a house or a car, it doesn’t require oil changes, repainting, or rust protection. If the hardware is still functioning, software will operate the same way it did on the day it was created, even a hundred years later. So, why do we treat software systems as if they were physical infrastructure, and then act surprised when that approach fails?

Lousy analogies

Talking about software development to people outside of the software development industry is hard. That’s why we employ many comparisons and analogies. Unfortunately, these analogies are often misleading and sometimes plain harmful. I am no exception and have done this mistake many times in the past.

It begins with calling ourselves software engineers, which makes other people think we are something like civil engineers, builders of bridges and nuclear powerplants. It continues when we utilize project management methods invented for construction industry. And ends with splitting the project into design and planning phase, development phase and maintenance phase. It’s the last topic I’d like to talk about today.

Before and after

When you decide to implement a new system, and it doesn’t matter if it’s a new e-shop, ERP system or warehouse management, everything is targeted towards the D-day, when all the glory gets released into production. The launch date, together with the implementation cost, is naturally the main thing of concern. The launch marks a perceived boundary between “development” and “maintenance”. Everyone expects there will be trouble after launch, so everyone expects the support from the supplier side to be heavy in the beginning, but it’s also expected that after things “quiet down”, the system will go into maintenance mode with just a fraction of the cost of the initial development.

It does not work that way. As we said, software does not need maintenance. What it needs is changing. All software contains bugs and they need fixing. Company needs change, customers change, competition changes, legislation changes. All of that requires you to change the software. The need for this change is usually much bigger than previously anticipated during the initial development.

It’s no exception that the yearly volume of development effort on an already deployed system is comparable to it’s initial development. Did you implement a new e-commerce website and it took you a year and cost $500k? Expect to spend further $250k-$500k per year for the next three years on necessary changes. If you don’t, the system will age real quick and soon will become a hindrance to your company. If you realize this only after, it’s too late and disillusionment sets in.

What does this imply? That the time after is more important than the time before. But you wouldn’t know it from the way we select vendors and manage implementation projects. When I look at the specifications for tenders for various systems and the bids from suppliers under those tenders—and I’ve seen a few—95% of the pages of text deal with the development contract, guarantees of meeting the launch date, what the project management bodies are, who signs the acceptance, how many mandays it will be, and what the discount or penalty will be for this or that misconduct.

The after period usually boils down to whether the contractor will answer emails from 9 to 5 on weekdays or even on weekends, and how long it will take before someone starts to address the fact that they ran out of disk space. Honor the exceptions, but there aren’t many.

Running a new non-trivial system is challenging for any business. Often it entails a change in your company’s processes and the workload of your people. It’s a stressful time. The last thing you want to do at this point is fundamentally change the way you develop and deploy that system. But that’s what usually happens.

Optimizing for after

The first step towards correction is to accept that there is a problem. At that point, we can begin to consider what we can do about it. With everything we do in the before period, we need to keep in mind if and how it will affect the after period. And when choosing a solution, consider that after is more important than before. I’ll illustrate what I mean with a few examples.

Example One: The Implementation Team

If your system is going to be developed by an external contractor, ask from the start what the after period will look like. It’s almost a rule of thumb that the contractor will have an implementation team for you, with their best people who I’m sure have many successful projects under their belt. It’s just that this team will probably go off to do some other project after the implementation is over, and you’ll be shuffled to the “maintenance” team. This brings with it a whole host of problems.

There will never be a perfect transfer of knowledge between the implementation and maintenance teams. If there is good communication between them, a third of the know-how will be lost. If it is bad, half of it is lost. The maintenance team will usually have more than one of these customers to deal with, so you will be competing for their resources.

Even if the supplier pretends that this is not the case, there will be a conflict of motivation between these teams. The primary criteria of the implementation team is usually to push back the launch date as little as possible, not to go too far over the planned budget, and to get the customer to sign the acceptance letter. This will logically be reflected in some architectural decisions at the expense of long-term development. If the implementation team had known that they would have the system on their hands for the next three years, things would have looked different.

Therefore, prefer a supplier who guarantees in advance that the subsequent development of the system will be handled by the same team that will initially develop it and that it will have the capacity to do so. This is even at the cost of not having those development superstars on the team who are used to doing a project and moving on.

If you plan to take over the development of the system internally, or you want another vendor to do it, you need to plan for this from the beginning and pull this new team into the development process as early and as intensively as possible. This is even though the supplier will tell you that it will delay them and not be as efficient.

If your priority is after, accept a delayed launch date or a scaled down first release rather than accepting that after will be done by a completely different team than before.

Example Two: Deploy vs. Release

The terms release and deployment of a new system, or even individual changes to an existing system, are often treated as synonyms. But in reality they are two different activities. The fact that they usually take place at the same time is rather detrimental. By deployment, we mean the process of getting the software to the production environment and ready to run. By release we mean the moment when the software, or functionality, starts to be used by users.

For most projects I’ve had the opportunity to see under the hood, deployment to a production environment doesn’t occur until very late in the development cycle. Often only weeks or even days or hours before the planned launch.

There are mainly two reasons for this:

Infrastructure is lacking for production environments. Servers cost money, of course, and it may seem like a waste to acquire virtual, or God forbid physical, servers before the system is up and running. This is one of the many reasons why you should use the cloud. No one is saying that you have to have the infrastructure scaled as if it were running at full capacity, but it should contain all the components consistent with a development and testing environment. That those components change as you go through development and the architecture isn’t final yet? Okay. It won’t even be after. At least then you get to practice what the process of adding, say, a new database will look like, and it forces you to automate and maintain the infrastructure as code.
Developers will be more efficient if they develop on a simplified or local environment, without having to “push” all changes through a complex landscape to production. If you develop on an environment that behaves and looks different from production, you’re setting yourself up for problems that will manifest themselves at the most inopportune time. If the process of deploying a change to production is skeletal and uncomfortable for the developer, it will be the same after. And what the heck, usually right after you need to deploy to production frequently and, more importantly, quickly. There’s no better time to automate and debug this process than when the problem with deploying to production hasn’t yet had a major impact.

Example Three: Performance tuning

The software must not only do what it is supposed to do, but it must do it fast enough and be able to handle production traffic with margin. Something can be addressed by beefing up the infrastructure, but an order of magnitude improvement is usually achieved by modifying the software—changing the SQL query, changing the row locking order, changing the calculation algorithm. To avoid trial and error, debugging tools are needed. I have witnessed more than once that quality debugging tools were available mainly on the development environment. Sometimes because of the cost of these tools (I’m looking at you, Oracle, with your Tuning Pack, for example), sometimes because of security concerns and fear of affecting production, and sometimes because while development was running on a local database, production was supposed to run on a managed cloud instance.

But it was always passed over with the idea that it would be debugged before going into production, tested properly and then it wouldn’t be needed. You can probably guess where this is going. It always turned out in the end that the need to tune performance after was orders of magnitude greater than before. A lot of problems can’t be figured out before launch with the best of efforts. And suddenly we’re blind. We can’t simulate the problem on the testbed, and we don’t have instrumentation in production. In addition, we’ll be making a lot of changes to the system, and we need to be sure each one doesn’t negatively affect performance.

A better way is to use only those tools for debugging that will work as well in development as in production. And if you don’t have such tools, get them as early as possible during the initial development so that developers can learn how to use them.

Ideally, programmers write automated performance tests covering the area they are developing right during development, in addition to unit tests, and these tests will be run in the continuous integration pipeline at each commit. This will serve not only during initial development to detect problems early, but especially after we want to safely change the system.

It’s up to you

The three examples of thinking I have outlined here are by no means a complete list of what needs to be addressed in the system life cycle. You have to make dozens of similar decisions. But I hope that I’ve hopefully managed to change the lens through which you look at your next new system. If you’d like help with this, get in touch.

So maybe I’ll add to the headline of this article after all:

Software doesn’t need maintenance, it needs to change.

Tags: management, digitrans