Advanced Computing in the Age of AI | Saturday, February 24, 2024

Beware Nuclear Winter: Another on the Way? 

 Hot as AI is right now, there are growing concerns that technology vendors may cause AI climate change – a new AI winter brought on by hype and unmet expectations.

AI is in a classic phase of the technology life cycle in which vendors, media and industry analysts are out ahead – possibly way out ahead – of the vast majority of potential AI end user organizations. There’s danger in that gap. CIOs and IT planners hear and read about the wonders of AI, dip their toes into PoC projects that not only flop but spotlight AI’s complexities, and they walk away with the perception that AI is either oversold or it’s for organizations richer in compute, data and data science resources.

Granted, concerns of an imminent AI freeze is a minority view. At Tabor Communications’ recent Advanced Scale Forum, Dell EMC AI & HPC Strategist Jay Boisseau said, “AI will have no more winters. People talk about whether there’s going to be another AI winter because it’s come in spikes over the last few decades. The only reason there were spikes is because there were spikes in academic, productive research, there was always a lack of data and computing power to fulfill it until now. There’s no more lack of data and computing power – we see lots of successes… there won’t be a lack of opportunities to use AI.”

Countering this is a reaction mostly among some academics and technology writers who point out the limitations of AI and the improbability of sentient, “general” or super AI. Our own view is that even if AI systems remain “high-speed idiots,” to quote a computer scientist of this writer’s acquaintance, and never become actual “thinking” machines (as writer Thomas Nield stated in Medium’s Toward Data Science, “When your smartphone ‘recognizes’ a picture of a dog, does it really recognize a dog? Or does it see a grid of numbers it saw before?”), this does not necessarily mean AI is about to go dark. AI already can be trained to infer and solve useful business problems using real-time data that are beyond the scope of the human mind; it will continue to do this in the future, only more so – that’s the main point.

If AI pessimism resides mainly among academics and commentators, rather than technology companies, at least one tech vendor has raised the specter of AI winter – Lenovo’s Madhu Matta, VP/GM, HPC & AI, who warns of a winter brought on by AI customer frustration failure due to lack of software that abstracts away AI complexity and inadequate vendor support services.

Citing previous AI boom-bust cycles starting in the 1950s (after which AI was so discredited that when AI-based innovations, such as search algorithms and expert systems, came to market they were pointedly not marketed as AI), Matta said, “at the end of the day there was no adoption. We are heading towards that right now, in my opinion, if we as an industry don’t fix the fundamentals of how people can adopt AI and use AI. We’re not doing that at the moment, or we’re doing it selectively.”

Here at Lenovo’s Accelerate conference in Orlando, Matta has been sounding the AI winter alarm – which he then turns into a marketing message touting Lenovo’s emphasis on software and services that ease adoption for AI neophytes. Amid discussion at the conference that there are now billions of data gathering/generating IoT devices, Matta noted that AI is still mostly confined to an elite vanguard of hyperscalers and major players in the financial services, manufacturing, retail and healthcare verticals.

“Our focus for AI for last two-and-a-half years is not necessarily on hardware,” he said, “our focus has been on building an entire software stack to make it simple for customers to start their journeys – forget about adoption, we’re talking about starting their journey into AI. As these billions of IoT devices come in, they will need AI to make sense of the data. We’ve got to make it easy.”

Lenovo is among several vendors, including IBM, Nvidia, C3 and others, building out platforms and services designed to ease AI adoption. On the software side, the company has been developing its Lenovo Intelligent Computing Orchestration (LiCO) software platform, a web-based portal designed to simplify use of distributed clusters for HPC workloads and AI model development. LiCO leverages an open source cluster management software stack, consolidating monitoring and scheduling functions. The intent is to ease interaction with the underlying compute resources and broaden access to open source cluster tools. It also provides AI training workflows for workloads such as image classification, object detection and instance segmentation, and allows users to copy existing jobs into the original template, with existing parameters pre-filled and modifiable.

On the hand-holding, consulting side of the AI adoption equation, Lenovo has developed four Innovation Centers in the U.S., Europe and two in Asia in which Lenovo conducts workshops and training sessions at no charge with customers seeking AI (and other data center challenge) guidance. As with Matta’s AI winter warming cum marketing message, the Innovation Centers are also intended to build relationships with organizations and to sell Lenovo products – but the idea is to do more than deliver a shipment of CPU-GPU integrated servers and wish the customer good AI luck.

The Innovation Center program starts with a questionnaire and discussion that lets Lenovo understand where, if anywhere, the customer is on its AI journey.

“When we’re engaging with customers on AI projects, each one is going to look a little different,” Robert Daigle, Lenovo’s AI Innovation and business leader, told us. “Some may already have data scientists, they may already have use cases identified, so where they need our help is for infrastructure and platforms, whether it’s our LiCO AI platform or if it’s specific hardware they need to run some of their toolsets, either on training or on inferencing. So we do a level set on where the customers at to see what they need.”

And then there are the AI rookies.

“On other end of spectrum, we have customers that know they want to do AI but they don’t know where to get started, and that’s where we see a real opportunity to guide them in the right direction and bring in the right partners to engage with them,” Daigle said. “We start with the Launch AI Workshop, for an initial engagement with our data scientists and AI architects to understand what they need.  Then we can start unpacking use cases... We want them to walk away with the resources to start a PoC, so we have deliverables we give them, next steps, we can help with business justifications. We want them to be bought in; what we don’t want is for customers to view it (a PoC) as a science project, we want them to buy into the value of AI can bring to their business.”

Lenovo ThinkSystem SR670

A case in point is byteLAKE, a Poland-based technology consultant that builds AI and HPC systems for its clients. On an as-needed basis, the company turns to Lenovo Innovation Centers for specialized AI support, such as when a forestry client engaged byteLAKE to develop an image recognition capability to identify diseased older trees or recently planted trees that did not survive their first winter. The goal of the project, byteLAKE Co-founder Mariusz Kolanko told us, is to use AI-based computer vision to analyze drone footage and satellite image data to track tree survival rates and mitigate deforestation -- and avoid manual image examination.

Using the DarkNet AI framework, byteLAKE built a model in which healthy trees could be distinguished from unhealthy – Kolanko said training required about 500 examples of each tree species. Everything was in order, except that training runs was taking two weeks.

“They were using a workstation,” said Daigle, “and it was taking them two weeks to run one training dataset. When they moved their data into the Innovation Center and ran it (on a Lenovo LSR670 ThinkSystem server, using an Nvidia Tesla T4 GPU)  they reduced runs to about six hours per run. That really allows them to accelerate the project, especially when they have to go back and do things like hyper-parameter tuning to get their model really refined and get the accuracy they want.”

He added that for byteLAKE's PoC, the ThinkSystem server used has a single GPU, but if byteLAKE moves to the production stage, a multi-GPU server could be used that would reduce run times further.