Advanced Computing in the Age of AI | Saturday, January 22, 2022

Technical Debt and AI: What It Is and How to Fight It 

Even as the COVID-19 pandemic created worldwide uncertainty in markets including semiconductors, food, vehicles and building materials, financial investments in artificial intelligence (AI) companies continued to grow. New research from CB Insights shows that the sector is shattering funding records in 2021, bringing in $38 billion in new investment funding to AI companies in just the first half of the year. That surpasses the $36 billion raised in all of 2020.

But an emerging impediment to cost-effective AI deployment at scale – the problem of technical debt – threatens the continuing growth and adoption of AI application development.

Technical debt in application development is the idea that some development work in a software application is left out or delayed to keep a project on track for a promised delivery date – even if all the desired features are not finished by its release date.

For AI, technical debt is even more complex, and its burgeoning role has gone unnoticed, but it is a critical catalyst for rising project costs and delays.

Building and deploying conventional applications and software systems is a deterministic and unidirectional process of forward iteration, followed by necessary changes. Technical debt in the process is predicted, planned and reduced over time with each release. Since this form of technical debt is predictable, it is factored into the budget, like any other expenditure.

But this is not the case with AI technical debt, which has a different makeup.

What is AI Technical Debt?

Decision-makers in enterprises and AI startups which are pursuing new business capabilities through AI development – like chatbots, facial recognition, intelligent voice assistants and automated text creation – must be aware of the differences involving AI technical and take steps to eliminate and prevent it.

The goal of AI development is to discover, train and deploy predictive models that are accurate and dependable. AI technical debt, however, involves the cost of the complex mix of processes and procedures needed to make it all happen. In AI, technical debt is not the consequence of human decisions, but is the result of the arbitrary demands of the software needed to achieve the required level of intelligence.

Especially with deep learning (DL), using neural nets and transformer algorithms for things like natural language processing (NLP), machine vision, voice recognition and synthesis, the complexity of the models makes effective management of technical debt far more difficult than for application development.

Those functional and procedural requirements are typically met by additive, ad hoc coding and manual human-in-the-loop tasks for managing and assuring the processes of developing and deploying AI models. The problem is that for the emerging generation of deep learning models with billions of parameters and potentially millions of dollars in training compute costs, managing AI technical debt for deep learning in this way is unsustainable. The stakes are just too high.

Avoiding AI Technical Debt

To battle the rapidly-increasing complexity of deep learning that has led to an explosion of AI technical debt, enterprises need assistance. That is where dynamic software infrastructures such as AI orchestration and automation platforms can help.

Research firm Gartner defines such platforms as giving enterprises the ability to enable orchestration, automation and scaling of production-ready and AI pipelines, while also delivering enterprise-grade governance including reusability, reproducibility, release management, lineage, risk and compliance management, and security. Gartner said such platforms can also unify development; hybrid, multi-cloud and IoT delivery; and operational streaming and batch contexts.

Numerous AI orchestration and automation platforms (AI OAP) are available for traditional machine learning, but only a few support the unique needs of deep learning, and even fewer offer multi-cloud transparency.

The big three hyperscale cloud providers – Amazon Web Services, Google Cloud Platform and Microsoft Azure – each offer their own proprietary orchestration and automation services. For enterprises using a single cloud for all deep learning workloads, these offerings can bring tremendous technical debt relief. But for many companies using multi-cloud and hybrid cloud for economic and regulatory reasons, adopting multiple OAPs introduces operational complexity that can significantly counter any expected reductions in AI technical debt.

Fortunately, there is now an emerging cohort of AI OAP providers addressing the deep learning needs of multi-cloud and hybrid cloud users with cloud-agnostic products and services that provide single-user interfaces and common capabilities across all environments.

Prospective users of these platforms will find that they vary in how they balance the needs of AI practitioners, managers and stakeholders for ease of use, accountability and time to value. These are all key contexts for AI technical debt, and they will vary across user organizations, so selecting the best AI OAP solution means taking a collaborative approach to assure the best fit between the services and the entire organization. Fighting AI technical debt is a team sport.

As deep learning continues to become an important vehicle for innovation across industries, there is an increasing urgency to keep AI technical debt in check because it can easily bury a promising initiative in unexpected costs. AI OAP helps eliminate AI technical debt, improve ROI, accelerate time to value and assure regulatory compliance for a wide range of deep learning requirements. Given the benefits, this type of infrastructure should be an essential element of every company’s AI strategy going forward.

About the Author

Serkan Piantino is the CEO and co-founder of, a cloud-agnostic MLOps platform vendor. Before founding Spell, Piantino was the founder and site director of Facebook New York and the co-founder of Facebook AI Research. In his nine years at Facebook, he designed and led the development of several Facebook products and infrastructures, including News Feed, Edge Rank, Timeline and Messenger. He served on former Mayor Michael Bloomberg's Council on Technology and Innovation, and currently serves on the boards of Tech:NYC and the Academy For Software Engineering. He earned a bachelor’s degree in computer science from Carnegie Mellon University.


Add a Comment