#engineering #product - 11 mins read

The tech debt elephant: A product perspective

As a product manager, balancing tech debt and architectural improvements with more conventional product work can be a challenging task. One of the most difficult in some places I’ve worked. On the one hand, it is important to maintain a healthy technical foundation for your product to ensure its long-term success. On the other hand, constantly focusing on technical improvements can take time and resources away from delivering on improvements that could deliver significant value to your users, move you closer to your desired outcomes, and keep you ahead of the pack.

Tech debt, also known as technical debt, is the accumulation of technical challenges and issues that arise when building a product. These challenges can come in the form of outdated technologies, poor code quality, or inadequate infrastructure. Over time, tech debt can lead to slower development times, increased maintenance costs, and decreased product reliability and user experience. And it is inevitable. Even if you clear it down entirely (never seen this happen), it will return over time like the many heads of the Hydra.

Martin Fowler visualised this really well in his 2019 Is High Quality Software Worth the Cost article, although he uses the term "cruft". Here's an adaptation of his illustration that really hammers home how it can suffocate you over time:

Illustration of cruft from Martin Fowler

And just like financial debt, it compounds. The more tech debt you accrue, the more it slows you down. The more you build on top of it, the more difficult it is to pay it back.

"With borrowed money you can do something sooner than you might otherwise, but then until you pay back that money you'll be paying interest. I thought borrowing money was a good idea, I thought that rushing software out the door to get some experience with it was a good idea, but that of course, you would eventually go back and as you learned things about that software you would repay that loan by refactoring the program to reflect your experience as you acquired it." Ward Cunnigham (2010)

Architectural improvements, on the other hand, are changes to the underlying technical architecture of a product that aim to improve its performance, scalability, and maintainability. These improvements can be critical for the long-term success of a product, but they can also take significant time and resources to implement. They can often prevent tech debt accumulating by keeping your stack ahead of the curve.

As a product manager, it is your responsibility to work with your teammates to collectively prioritise tech debt and architectural improvements in the context of your overall business goals and strategies. This means carefully balancing the need for technical improvements with the need to deliver on more strategic goals.

To do this effectively, it is important to work closely with your tech lead to maintain a healthy balance between tech debt, architectural improvements, and conventional product work. Here are some steps you can take to help your team to accomplish this.

Identify and prioritise tech debt and architectural improvements

The first step in balancing tech debt and architectural improvements is to identify the specific challenges and opportunities facing your product. This can be done through a variety of methods, such as conducting code reviews, conducting user research, or analysing performance metrics.

This warrants dedicated time. This is one of the first things I’d recommend you do to help you hit the ground running in a new team: talk to your tech lead about the state of your product. Map them out on a Miro board. Give them a sentence of context. Provisionally rank them. If any have a hard deadline, jot it down.

Once you have identified the specific technical challenges and opportunities facing your product, you can prioritise them in the context of your objectives. For example, if a specific technical challenge is causing significant performance issues for your users, it may be worth prioritising over other improvements that would have a less immediate impact.

Communicate openly and regularly with your tech lead

Working closely with your tech lead is essential for maintaining a healthy balance. This means regularly discussing the technical challenges and opportunities facing your product, and aligning on a plan to address them.

Your tech lead can provide valuable insights into the technical complexities and trade-offs involved in addressing tech debt and implementing architectural improvements. By working closely together, you can develop a shared understanding of the technical challenges and opportunities facing your product, and make informed decisions about how to prioritise them.

A word of warning though: human nature often compels us to optimise everything. I’ve worked in many teams where the tech lead (or other eager engineering colleagues) have tipped the balance the other way. One thing you have to be prepared to do - and potentially one of the most impactful things you can do - is to challenge. To ask “why?”. To ask “but what if we don’t do this until next quarter? Or next year? Or not at all?”. You owe it to yourself, your team, and your users to constructively challenge this with the same rigour you would apply to more conventional product work. Do not let perfect become the enemy of good. Plus, this is an excellent way to learn about the intricacies of your tech stack.

A few mechanisms that can help shine a light on tech debt and architectural improvements and bump their profile in your team and wider organisation:

A fortnightly cadence where the two of you (or the team as a whole) review tech debt and upcoming architectural work.
A performance dashboard that the team regularly reviews - at least weekly.
Dedicating a rough percentage of the team’s efforts to tech debt / architectural improvements.
Treating tech improvements with the same weight as more conventional product work.
A version dashboard and automated tooling that assesses vulnerabilities.
Tidy Friday - dedicating a day each week to picking off tech improvements.

Create a technical roadmap

To effectively balance tech debt and architectural improvements, it can help to have a long-term technical roadmap in place. This roadmap could outline the specific technical challenges and opportunities facing your product, as well as the steps needed to address them. This doesn’t need to be comprehensive or polished. Keep it simple and accessible, or it’ll quickly become stale and unusable.

It’s a good habit to regularly review and update it to reflect changes in your business goals and strategies, as well as changes in the technical landscape.

Be transparent with stakeholders about technical challenges and trade-offs

Balancing tech debt and architectural improvements is not always a straightforward task, and it can involve making difficult trade-offs. For example, you may need to prioritise addressing tech debt over other work to meet a critical deadline.

It is important to be transparent with stakeholders about these trade-offs, and to explain the reasons behind your decisions. This can help to build trust and understanding among your stakeholders, and ensure that they are supportive of the technical choices.

I’ve often fallen into the trap of not surfacing tech improvements in planning conversations and progress updates, which can damage team morale and erode stakeholder trust. If we think this is the most valuable thing to be working on, why would we not communicate this? I now try to give the same prominence to all work, and encourage my teams to keep me honest on this.

Establish SLIs and SLOs

Setting Service Level Indicators (SLIs) and Service Level Objectives (SLOs) along with error budgets can prove incredibly useful tools to help you find the balance between product development and technical debt. SLIs help measure real customer impacting metrics and provide a clear understanding of the impact of technical debt on the end-users. If the team is consistently exceeding the error budget, it's a sign that they need to focus on stability and technical debt rather than pursuing new product development. This helps ensure that the product remains reliable for the users and enhances their experience.

On the other hand, if the team is not touching the error budget, it means that the product is running smoothly and the team can take more risks and move faster. In this scenario, the focus on technical debt can be reduced, allowing the team to concentrate more on product development. This way, the team can maintain a healthy balance between developing new features and ensuring that the product remains stable and reliable.

Here's an example of each:

SLI: Request latency
SLO: 99th percentile request latency should not exceed 500ms
Error budget: 1% of total requests can have a latency of more than 500ms over a rolling 7-day window.

Act before it is too late

Make no mistake: you have to take the health of your tech stack seriously. I’ve seen many product managers burned here - particularly those not from a technical background. Particularly in organisations taking the leap from start-up to scale-up.

If you do not give it the love and respect it warrants, you will shrivel and die. Death by a thousand cuts. The people-related inertia in a scale-up is difficult enough to negotiate without throwing mounting tech debt into the mix. John Cutler visualises this perfectly in his Your Calendar = Your Priorities article:

Illustration of make-up of work

The areas for improvement can be grouped into six pots in my experiences:

Inadequate infrastructure / architecture
Poor code quality
Continuous integration / continuous deployment (CI/CD)
Automated testing
Manual interventions
Poor user experience

...and here's a crude summary of how often I've seen each, how much it has impacted the team, and how difficult it is to resolve, all scored out of five:

Area for improvement	How common is it?	How much does it hold you back?	How difficult is it to address?
Inadequate infrastructure / architecture	3	2	1
Poor code quality	2	3	5
Continuous integration / continuous deployment (CI/CD)	4	3	4
Automated testing	4	4	3
Manual interventions	4	2	3
Poor user experience	2	2	3

Inadequate infrastructure / architecture

A combination of good tooling, a well-disciplined team, and responsible servant leaders can easily keep this one in check.

Broadly speaking, unless you have a very bespoke product, your infrastructure should be a commodity. As long as your leaders are prepared to pay the required amount and you keep an eye on volumes and performance, cloud-based services should prevent this sapping too much of your time.

Architecture is much more likely to rear its head in conversations with your tech lead - it can prove challenging to stay on top of your dependencies, vulnerabilities, and also to identify impending constraints that may necessitate change.

Poor code quality

This covers a myriad of different things:

Corners have been consciously cut in an attempt to reduce time-to-market
A product has evolved in such a way that the code base is now sub-optimal
Where the quality bar has been low due to a lack of skills or experience

…and all of these can lead to the same symptoms:

An unnecessarily bloated code base
A complex and difficult to navigate code base
A lack of unit and integration tests
An unbalanced app split out over too few or too many repositories
Problematic dependencies between teams and components

Tech debt in this category is usually the most challenging in my experience. It is also one of the most subjective: one person’s simple can be another person’s complex.

Continuous integration / continuous deployment (CI/CD)

I would go as far as to say that CI/CD is essential if you are looking to operate at scale. You can live without it initially whilst you seek product market fit, but sooner or later you’ll begin to burn in environment and regression hell if you don’t give this area the requisite attention.

I’ve typically seen five steps to this:

First, get a decent set of environments established that you can deploy to with relative ease. Here’s my go to:

Illustration of environments

Second, establish continuous integration, whereby any new code changes are automatically deployed to your dev environment, and your engineers merge little and often.

Third, work in your automated test suite to your continuous integration process. There is an argument for holding off on CI until you have a basic level of testing, particularly if you already have an established product.

Fourth, automate your release to production and rollback processes.

Last, ensure that you apply a product mindset to your CI/CD, regularly measuring, assessing, and improving as you go.

The good news here is that this can be introduced at any stage, although the further along your product is, the more convoluted it will be. Dedicated devOps / automation specialists can help to move things along here without disrupting the whole team, although I would encourage fostering shared ownership and knowledge transfer amongst your team to lower the dependency on any one team member.

Automated testing

If this isn’t considered from the start, it becomes more and more challenging to work in retrospectively. My steer would always be to have some core automated testing around your product from day one(ish). It can be lightweight, just covering a very basic happy path - or even semi-automated, such as a suite of Postman requests. The amount of time my teams have lost by deploying broken software to dev, test, and even prod environments completely justifies the initial outlay. Stating the obvious a little, but the lack of good automated testing also significantly lowers the quality bar and eats up all of our time unnecessarily manually QAing, but just as importantly it also erodes user and stakeholder trust.

If you are playing catch-up and looking to introduce or improve automated testing, bringing in dedicated test automation specialists can enable you to do this without slowing down your progress on other fronts too significantly.

Be mindful that this inevitably will slow you down, particularly if your engineers are not well-versed in automated tests. It’s a good idea to involve all of your engineers in the process of building out automated tests, and much like with CI/CD, I’d encourage collective ownership of them across the team, with tests becoming a given for each new product improvement.

Manual interventions

Whilst CI/CD and testing are the obvious and well documented candidates for automation, there are lots of other areas where teams can adopt the same mindset. Maybe this is more process debt than tech debt, but think about onboarding customers, investigating support issues, patching databases, updating permissions - all of the things that sap the team’s time and headspace. Every team I’ve worked in has had some of this. This GDS blog post is a good example.

Poor user experience

It’s also a stretch to call this technical debt - perhaps product debt would be a better label - but I’ve seen this come back to bite teams many, many times. Where UX hasn’t been considered from the off, this can result in months of dedicated effort to raise the bar. This can easily gobble up all of your front-end engineers and make it difficult to progress any other user-facing improvements in the meantime. Error wording is a common example, so often the afterthought.

As with all of the above, baking this in from the kick-off can really help - or at least just being consistent in your design so you have a baseline to work from. Having more of a full-stack mentality can also help, as more of the team may be able to blast through the work at once.

In summary

Tech debt is here to stay. It necessitates dedicated thought, visualisation, and development effort. Don’t bury your head in the sand. Don’t hide it from your stakeholders. Take it seriously. Continually strive to set and reset the balance.

Written by Dave Baines

Co-founder, Product Lead

12 Feb 2023