Advanced Computing in the Age of AI | Sunday, May 19, 2024

Machine Learning: Scaling Past the Pilot Phase 

source: Shutterstock

In a world of cars that drive themselves and robots that do back flips, the fully automated industrial operation remains an aspiration. Limitations in real-time computing, AI and robotics slow progress, regardless of how advanced a company is or the size of its data science team. Even Elon Musk, who has some of the biggest ambitions out there, can only automate Tesla factories piece by piece.

Bottom line: Even though some level of automation at scale is attainable, many industrials find themselves stuck at the pilot project phase. A combination of short-staffed data-science teams, limited data and limited ability to quickly affect business processes stymies expansion. But with a new, open-source way of thinking, industrials can drive digital operations at a pace similar to the large tech companies.

The Stiff Demands of Scaling

On the surface, it sounds relatively straightforward to sensor up thousands of components, wire them into a machine learning (ML) -driven infrastructure and enjoy the benefits that digitalization has to offer.

To that end, most industrials have ML pilot projects, such as labor-saving predictive maintenance (PdM), safety improvements, logistics and asset utilization, up and running. Initial results look promising, so the team agrees to scale the pilot across the organization.

That means expanding a small environment exponentially to thousands or even millions of sensors. Let’s say the pilot program of five ML models worked well in production, and now it’s time to test and run 50. The scaling process breaks down at this point.

Why? One data scientist can only program and babysit so many models, so the data science team must grow — usually beyond the budget of the organization. Moreover, the pilot does not have diversified enough data to work on a grand scale. PdM, one of the most important industrial functions of ML, cannot accurately predict machine failure from a small sample size. ML can only learn to predict failure when it can model failure, and that requires data from machines that actually break.

Stuck at the pilot phase, the industry cannot implement large-scale programs such as PdM, nor does it have access to the large, diversified data pools it needs in order to innovate new forms of automation and deeper business insight. The obstacle to becoming data-driven feels insurmountable—unless we rethink how to achieve machine-learning gains.

A Democratized Technology

To solve the scalability problem, we first need to revisit the fundamentally open nature of AI and ML.

The seemingly magical results of ML were first delivered by the tech industry to consumers, and this provided inspiration for other sectors. Search recommendations (as found via Netflix, Spotify, Amazon, and others), voice-command apps such as Siri and ridesharing services such as Lyft and Uber put the power of advanced computing at our whimsical fingertips.

The back-end processing power behind such applications was made freely available in the form of open-source software. Other sectors, including industrials, innovated on top of frameworks, such as Apache, to drive solutions for their own use cases. Advanced neural networks and deep learning algorithms are cheaper and more accessible to everyone, enabling anyone with technical know-how to grab powerful ML programs off the shelf. Finally, the big-data advancements that drive ML spill over into small data, such as the sensor data collected by an industrial business.

That level of extensibility was unimaginable in the early days of AI, when software was built on mainframe computers and programs cost millions of dollars to put into production. Today, democratized AI means customized solutions are available to everyone, and savvy data science teams do not have to be hired in-house. Industrials seeking to scale have a new kind of option available, Data-Science-as-a-Service (DSaaS) in which a savvy team uses open source tools and anonymized benchmark data, to ensure that pilot programs scale successfully and sustainably.

An Open Source Approach to Expansion

Choosing the right tools and building the proper infrastructure will become ever more important as the ML tidal wave continues across industries. Pilot programs are an important first step, but expanding and unifying pilot programs is the next phase in the adoption cycle. Open source tools have led to the open sourcing of entire data science teams in the form of DSaaS, making even tech company-grade automation systems available to industrials for the first time, at the scale of thousands to millions. Any enterprise stuck in the pilot phase has a new opportunity to re-think scalability.

Tor Jakob Ramsøy is founder and CEO of Arundo Analytics.