Four flavors of operational work

Feb 29, 2024

Those who read my newsletter for a while know how much I stress the need to plan operational work. While product people only care about features, technical leaders must make sure the machine keeps running.

Too many teams plan as if all they will build are features. The cold reality of building products is that a considerable amount of our time needs to go to the other stuff. Operational work is an umbrella term for all the unsexy tasks that don’t deliver new features.

Depending on the team's size and the product's complexity, teams should aim to spend 25% to 50% of their time on operational work. It's a shocking number to those who don't build software, but it makes sense to those who do.

Back when I worked on projects, I often saw companies estimate the price of building but neglect the cost of running. They would spend a million to build it but would believe the operational costs to be negligible.

Software developers know this isn't true. The first months of greenfield development feel like a honeymoon, spending 100% of the time building new features. But quickly, the team starts to run into limitations.

Maybe the architecture needs to be adapted to accommodate new features. New insights and early feedback highlight flaws and opportunities in the original design.

All of a sudden, the team is spending 3 hours per week maintaining the features that were "Done" already.

And when users get involved, that number of hours rises quickly.

So, rather than looking at the 25-50% as a hard rule, look at it as a spectrum. On the one hand, the Greenfield Project is spending 0% on operational work. On the other extreme end, we have the dumpster fire, where no new features can be built because of excessive technical debt. Those teams spend 100% of their time just keeping the lights on.

Teams should aim for somewhere in between, and the plan should reflect that.

25%-50% is a good guideline.

The four flavors of operational work

Operational work comes in four distinct flavors, and we need to be able to identify them so we can plan them correctly.

First, there's the maintenance work of upgrading dependencies, archiving data, and right-sizing infrastructure. That tedious, boring work that never gets top priority on the backlog. Yet when we fail to maintain our product, the system starts to rot. Maintenance is the work we need to do to keep existing features up and running. If we don't maintain our product, we get outages that hurt our users.

Then there's system redesign and improvements. This is the technical work that developers love, and managers dread. Changing the API design, splitting off a microservice, ... We all know what happens when teams can't get their technical debt under control. Yet, given the choice between a new feature or finally getting rid of that last AngularJS component, shiny new toys always seem to win. System redesign is the work we need to do to keep our system flexible. If we don't redesign, building new features gradually becomes more expensive.

Bug reports are another category of operational work. When we receive feedback from a customer that the system isn't doing what it's supposed to do, we need to fix that. Whether we drop everything to fix the bug or throw it on the To Do pile, we need to plan time to do the work. It's crazy how many real-life plans exist that act as if teams only deliver bug-free software. Bug fixing is the work we do to keep perceived quality high. By consistently fixing these mistakes, we build a better product and increase trust with our users. By pushing low-priority bug tickets down the backlog, we get the opposite effect.

And finally, there's customer support. Some user requests aren't bug reports. They are questions or one-off interventions that still need our attention. Figuring out why that user who can't remember their e-mail address has trouble logging in, isn't a developer's dream job. Not much cerebral enjoyment can be found in handling a GDPR complaint. But these one-off interventions are part of doing business. Customer support is the work we do to keep our users happy. It's incredible how much credit a team can build by being responsive to their customer's needs.

Maintenance, redesign, and bug fixing are pretty straightforward to plan. These are best handled with Recovery Blocks. Rather than dumping them into a backlog where they'll never see the light of day, carve out dedicated time to handle these work items.

But support has always been more difficult. Whenever a client has a demand, it poses a Catch-22. We can't do what we say we would do if we always drop everything to help the customer. When every Intercom ticket can mess up our plan, we can't plan at all. Teams in such an environment need to pray for a quiet day to build what matters.

On the other hand, if we ignore customer work, we're setting ourselves up for failure. A support ticket that's not urgent today will catch fire one of these days. Most of them can't wait until the next Recovery Block.

Customer support is best handled in Support Swarming Sessions. These are all-hands mob programming moments sprinkled throughout the week. I'll describe those in one of the next editions of the Planned Attention newsletter.

While bug reports and customer support requests might both be gathered in Freshdesk, the way we should handle them is wildly different. Redesign can be batched in larger initiatives, whereas maintenance work should be planned in small increments.

There are four distinct kinds of operational work, and identifying which kind we're dealing with is the first step toward a reliable plan.