As software engineers/architects, we all understand that technical debt must not be allowed to build up overtime otherwise, it can grind your team velocity to a halt and hurt the evolvability of your systems. There are many posts and articles that expound the virtues of paying back the tech debt and philosophise at great lengths […]
As software engineers/architects, we all understand that technical debt must not be allowed to build up overtime otherwise, it can grind your team velocity to a halt and hurt the evolvability of your systems.
There are many posts and articles that expound the virtues of paying back the tech debt and philosophise at great lengths about the various design patterns that can improve the design of the code. In this post I’d like to tackle the more pragmatic and real life aspects around tech debt from my perspective:
Product Owners will often have questions and concerns when you bring up the matter of technical debt with them.
Why should we focus on something that no user will ever really see or benefit from, as opposed to building these highly valuable features that the users do want?
Its a very valid question…to an extent because POs inherently might not understand why some out of band technical work is being done that wasn’t requested by a user or wasn’t on the roadmap. But as engineers/architects, we know the real cost of technical debt and its as much our responsibility to communicate it in a way that stakeholders can understand it, as it is to actually pay it off and keep our systems delivering value today and tomorrow.
Its critical to understand that software (architecture) has two main stakeholders:
This might come as a bit of shock to the uninitiated but its true! Whilst its the business users that are deriving the business value out of the software we the developers, have built for them. Its us the developers, who are responsible for maintaining and evolving it.
We live in it all day every day! How can we possibly only ever meet the needs of the users without sparing a thought for the developers who have to maintain that codebase? And if that codebase is a pile of mess, even though the users might not see it, it will make it incredibly frustrating for developers to work on it and keep evolving it. Eventually, this will wear their motivation down and make for an unhappy team! No PO (should) want that!
This Dilbert strip, for me, covers this succinctly 😀
Its important to understand that not all tech debt is equal or bad, the following factors play a huge role in determining the value and severity of the debt:
The SEI book Managing Technical Debt (the inspiration for this post’s title) introduces this pseudo-scientific graph called the Tech Debt Timeline:
The idea being that up until a certain point, tech debt could be an asset because it can make you go faster (because you are not engineering for perfection or stability but just to get the product out the door and get user feedback).
But beyond that point (Tipping Point in the graph above), the team will start feeling the pain of that initial debt in their every day work to the point that it will make it harder and harder for them to keep evolving the system to meet the future business needs. Then you reach a point where you MUST remediate the debt before you do anything either by undoing the past sins or just re-writing from scratch (depending on how far gone you were). This is certainly not the place you want to be in for too long, because it takes time away from the business critical work and could jeopardise the roadmap.
For the last few months, in our team we’ve been experimenting with a tech debt management process of our own devise to see if we can make this techno-mumbo-jumbo-debteroo more visible for all team members and more accessible for our Product Owner.
First off, we never dedicate x% of our sprint to tech debt, we believe in continuously improving our systems incrementally. Dedicated tech debt sprints, for me, is an anti-pattern because during that time you are not delivering business critical updates but also, it seems like an excuse to let the debt build up and pin your hopes on that x% of the sprint to pay it all off. May be there is a way that can work, I’ll be keen to know!
For planning and tracking purposes, we divided a somewhat umbrella term technical debt into 2 distinct groups:
This debt is not due to team’s decisions but team’s decisions could make it easier or harder to make these kinds of changes to their architectures. Usually, org tech debt will have a fairly clear timeline and cost incentives which can make it really clear for the stakeholders like POs to understand it from a time and cost perspective and prioritise it on the product backlog.
Debt is not always in code either, sometimes its in outdated documentation or architectural diagrams or even in the development process itself!
The problem is that this kind of debt isn’t exactly quantifiable in monetary, time or user value terms, and that’s what makes it really hard for POs to understand and help prioritise it. If they cannot understand it then chances are that this “technical noise” might just end up getting squeezed out of the backlog eventually (true story).
We actively try and educate non-tech stakeholders to steer clear of artificial metrics like, “no of errors/bugs saved per month” or “money saved by refactoring some code” etc. because these “metrics” imply a false correlation with debt which makes them fairly meaningless.
So then how do we communicate this well enough to justify putting it on the backlog and pursue it with a bit more buy-in?
First, when refining the business epics/stories on our backlog, we always look at the code that could be affected by this story. We try to identify any potential technical debt that could impede the business relevant changes either directly or indirectly, now or in the near future.
Once we have identified the problem child, we might do one of 2 things:
Additionally, to track any ad-hoc tech debt items we have what we call a Wall of Tech Debt which is essentially a digital canvas with e-sticky notes stuck along value – effort axes. Value on x-axis and effort on y-axis.
The team adds whatever adhoc tech debt items we think might be worth discussing and creating an addressal plan for. Then every once every week, we review these items and put them a bit more “accurately” along the effort-value axes.
We determine value roughly by looking at what exactly is the item aimed at resolving and just how much of a pain it is currently. For e.g. simplifying the test suite in which we work a lot everyday, could yield a lot of readability, maintainability and quality benefits so we would class that as high value. Whereas something like moving a hard coded list of e-mail addresses from code into a database, could be classed as low value if the list doesn’t change very frequently and working with the hard coded list is easy enough if not perfect.
It doesn’t stop there though, value can also be determined in terms of the following architectural quality attributes:
Because its difficult to associate tech debt with quantifiable benefits/metrics, we make sure to mention these value benefits as clearly as possible in the story on our backlog.
Effort is usually determined by essentially timeboxing the work, if we cannot reasonably address the item in a couple of hours, it gets marked as high effort and a story on the backlog is created for it where it will be prioritised based on the value of the item and relevance to any roadmap item.
If there is a strict deadline on a piece of tech debt (usually this happens with org level tech debt items) then we tend to prioritise it accordingly. If the effort is low and deadline is distant, we might schedule it for later. If the effort is high and deadline is proximal, then we tend it prioritise it sooner.
The process in its entirety looks like this. The red, green and blue post-its describe the kind of changes that go in that lane. Green ones are quick fixes and red ones are longer terms changes that we will do over several sprints.
A majority of the tech debt items fall in either the blue or green lane, only large and cross-service tech debt items tend to get addressed in the red lane and span across multiple sprints.
We’ve observed since having employed this process that we’ve become a lot more diligent with our technical debt identification, more articulate with its value assessment and clearer in our communication with our PO. Of course, just like the debt itself, there is no quantifiable metric for the effectiveness of this process but my team mates seem to think that its helped, so I am inclined to believe them!
I will be keen to know how others handle tech debt in their teams so please feel free to leave a comment!
Header image source.