The key to unwinding technical debt expeditiously is identifying the few pieces of code that have the most significant negative impact on the business — a task typically more challenging than it may seem.
The state of affairs
The Wall Street Journal ("The Invisible $1.52 Trillion Problem") recently highlighted a big burden businesses put on their balance sheets with an often overlooked form of debt. Technical debt. The urgency to address this problem has been laid bare by the widespread challenges of optimizing custom-built applications for the cloud. Now, the GenAI phenomenon further exacerbates the issue, as it could lead to a tenfold increase in the amount of code being produced every year.
Not surprisingly, the topic of "technical debt eradication" is on the mind of every IT executive and it is becoming a board level imperative. I have encountered several CIOs spearheading initiatives to unwind tech debt, who are dedicating time, energy, and resources to "eradicate it all," often with little noticeable positive impact.
Good debt vs bad debt
According to Wikipedia, in software development, technical debt is the “implied cost of future rework that arises when a solution prioritizes expedience over long-term design." Like financial debt, when technical debt is left unattended it could incur ‘interest’, making it increasingly difficult to implement changes, which in turn affects adaptability, competitiveness, and overall business performance.
|
Good tech debt can be described as the debt incurred to foster growth and innovation without significantly affecting your business. Conversely, bad software debt hinders business performance as it impedes the innovation, agility, and growth your software underpins, whether developed recently or years ago. Balancing quality, innovation, and time to market has always been a challenge. I’ll leave it to you and your management strategy to best strike that balance. |
However, I believe I can assist in distinguishing between debt that has minimal impact and debt that has high impact on the business. That is the crux of the 8% approach.
Mapping the software landscape
Putting myself in the shoes of a CIO of one of those Fortune 500 companies, managing a vast environment of off-the-shelf packages supporting back-office functions and over 300 custom applications that together amount to 50 million lines of code (LoC) for core business operations, I’ll first set aside the commercial off-the-shelf software, as I don’t control that code, and will look at my own software code.
First, I would identify the custom applications with high concentration of technical debt and cross-reference that against business criticality. This can be fully automated and done within a week using software mapping and intelligence technology (fig 1). That involves combining code-level technical debt insights (X-axis) with a rapid survey to evaluate business criticality (Y-axis). The size of each bubble (application) reflects the number of LoC.
Figure 1. A 30,000-foot view of a portfolio of 300 custom applications, generated by CAST Highlight.
Then, I would focus on the most important business applications that also exhibit high density of technical debt, while ignoring those with minimal impact, viewing them as 'good enough' for now. The map above indicates that the area for deeper investigation, in the top right corner, includes only 5 or 6 applications. I may also explore several other mission-critical applications in the top left corner, where four sizable bubbles representing 2 million LoC each likely hide a fair amount of technical debt, even if the density per line of code is lower. Mapping the software landscape allows me to drastically scope down further investigation. Still, in this example, I am facing a total of 12 million LoC to assess and a lot of technical debt to contend with.
Picking the targets
According to Gartner (Measure and Monitor Technical Debt With 5 Types of Tools), poor software structures have an unbounded negative impact on quality and delivery speed. This observation rests on the phenomenon succinctly articulated by MIT’s Dr. Richard Soley in his seminal research with the Consortium for Information and Software Quality, Carnegie Mellon University, and the Object Management Group.
|
Dr. Soley finds that “the same piece of code can be of excellent quality or highly dangerous”, which depends on the context. It depends on how the individual units are structured together. He goes on to state that “basic unit-level errors constitute 92% of all errors in source code. However, these numerous code-level issues account for only 10% of the defects encountered in production. Conversely, poor software engineering practices at the technology and system [architecture] levels make up just 8% of total defects, yet they result in 90% of the significant reliability, security, and efficiency issues in production.” |
This raises the question: how do we identify the 8% that matter most within an extensive codebase, 12 million LoC in this example, and resolve those defects most efficiently?
Gathering the right intelligence
Traditional code analyzers, which analyze syntax, could generate a list of issues with each unit inside an application, and even prioritize what they see as the most critical issues, at the unit level. This means that I would still need to manage thousands of tasks for my teams and suppliers to resolve, resulting in thousands of person-days of effort. Moreover, the most critical issues at the unit level will not necessarily match the 8% that I am after, which typically stems from contextual and structural issues. Per the above Gartner research, traditional tools “cannot provide high abstraction-level visibility to identify technical debt in the architecture”.
For example, a traditional code analyzer may flag a piece of code for optimization due to performing a table scan instead of using an index. While technically the code isn’t perfect, understanding the context may tell us that it operates on a reference table with only a few rows. Optimizing the code only shaves off a fraction of a millisecond from the response time. Why address such seemingly 'severe' code issues if in context they are not severe at all.
Conversely, a traditional code analyzer may greenlight a piece of code, yet its immediate context renders it potentially dangerous in terms of user experience. A common example of that is when a small piece of code, nested within a loop, makes a call to a distant REST service or a remote database (fig 2). While a single remote call may be acceptable, making 1,000 consecutive calls, with time-consuming roundtrips, especially when far from the data source, could severely hinder the system. There are numerous situations where the individual unit quality will be seen as “good”, but it does contain technical debt negatively impacting application behavior. Thus, I need to analyze the unit of code not in isolation but in the context of the whole system, considering the flows from the user input layer to the business logic, APIs, frameworks, data access layers, down to the database. So, how do I do that? Couldn’t my team understand the context by looking at the code and talking to architects? |
Figure 2. A real-life scenario where the code unit is beyond reproach, yet system performance suffers. |
Finding the needle in the haystack
The challenge is that even a small application with 30,000 LoC harbors thousands of interdependencies between its internal code units. See the lines in fig 3, where each dot is a code unit.
Figure 3. Application ‘internals’ map of a rather small and simple application. Generated by CAST Imaging.
It gets exponentially worse for a typical mid-sized application with merely 300,000 LoC (fig 4). In this case it is comprised of 41,000 units and has numerous interdependencies. See (if you can) the lines in fig 4.
Figure 4. Map of the inner workings of a mid-size application with 300,000 LoC. Generated by CAST Imaging.
Now, imagine the staggering complexity of a core system ten times the size, with 3,000,000 LoC or much more. No wonder the adage “don’t touch it, if it works” has become so popular in the world of software engineering.
Clearly, grasping the entire application context needs automation. Currently, the fastest way to automate that knowledge extraction is to rely on technology that performs semantic analysis of each code unit and its interactions with all other units. This means using technology that understands what the unit does in the context of the surrounding architecture. It maps the thousands of units illustrated above and creates a ‘digital blueprint’ of the application’s inner workings. It’s like using an ‘MRI’ to see the connections between the ‘brain, nerves, and muscles’ inside each of the critical applications.
|
Upon completing such analysis, a knowledge base is created capturing all interconnections. Then, specific structural quality rules can be applied (see ISO 5055), to spotlight problematic patterns in context. That’s the 8% of tech debt I am looking for. Semantic and contextual analysis is what for example CAST Imaging does, among others such as Coverity for C++ code. Now that I have identified the few issues that once fixed will offer the greatest improvement, I can put aside the rest of the technical debt that had been identified. |
The order of magnitude of effort is now in the hundreds, not thousands of issues to fix. A handful of developers and architects can explore the context surrounding each issue and fix them.
Stepping into the future
To improve further the level of automation, we can leverage the generated call graphs associated with each issue to ‘feed’ prompt engineering for GenAI tools. The AI will then generate a new piece of code that implements the necessary fix while minimizing the risk of breaking the system. Indeed, when generating code to fix an issue in unit A, the AI must be aware of whether and how unit B interacts with A. Failing to consider this could damage A’s interactions, leading to system failure when B calls A, and simply creating new technical debt
You might assume that AI could process millions of lines of code by breaking them into thousands and thousands of small parts and submitting them separately, but unfortunately, that approach wouldn't be effective. OpenAI for instance is limited to approximately 4,096 characters per input, which is roughly equivalent to 70 LoC. Additionally, the ability to retain information is confined to the duration of the interaction. For larger amounts of information, AI will prompt you for context and summaries with each part. This is ChatGPT’s own view on the matter: “If you have a lengthy query, it's best to divide it into smaller sections and submit them one at a time. Generally, handling around 5 to 10 distinct queries or pieces of information in one interaction is optimal. To achieve the best outcomes, it’s recommended to break complex information into smaller, coherent segments and provide context with each section”. In short, the AI is as good as the information it is fed. For more details, see Grounding AI with Software Intelligence.
Since publishing the paper a few months ago, several companies have prompt engineering prototypes feeding AI as outlined above, and demonstrated tangible results, fully automating up to 20-30% of the necessary remediation work. This approach does not quite work yet for complex issues or situations requiring extensive changes, such as refactoring, but 20-30 % is indeed significant. Feel free to reach out to me, if interested in exploring this further.
The biggest bang for the buck
Trying to eliminate technical debt from A to Z or address all code quality issues at the individual unit level without the context of immediate surroundings is not an effective strategy. The key to unwinding technical debt is to map out the critical applications ripe with technical debt, then identify the few pieces of code inside that have the most significant negative impact on the business.That all sounds like common sense.
What’s crucial for identifying the 8% of flaws that matter most is understanding the internal application context. Semantic analysis technologies can automate the discovery. As MIT Fellow Dr. Soley pointed out, the same piece of code can be perceived as either good or bad depending on its surrounding context. Taking that into account will address your technical debt most efficiently and bring the desired business outcomes within the least amount of time. And of course, applying the 8% approach proactively, before any code changes make it into production, can reduce the expense of having to deal with bad tech debt to begin with.
SHARE