Advanced Computing in the Age of AI | Thursday, March 28, 2024

Cisco Servers, Oracle Cloud, Boost AI Offerings 

Cisco and Oracle made strategic AI announcements this week, boosting their machine/deep learning capabilities based, in part, on Nvidia GPU integrations.

Building on the foundation of its nine-year-old UCS (Unified Computing Systems) server line, Cisco announced the UCS C480 ML M5 Rack Server, which the company said is an optimized machine learning system designed to help IT organizations scale AI at “any location by capitalizing on the adaptability, programmability, and manageability of the Cisco UCS portfolio.” It’s a 4U server with Intel Xeon scalable processors and eight Nvidia Tesla V100-32G GPUs with NVLink interconnect.

As we’ve seen from several other systems vendors (Dell EMC, HPE, IBM) in recent months, Cisco’s baseline strategy is to provide an array of hardware-, software- and cloud-enabled capabilities, along with services, designed to allow organizations to give organizations a strategic AI jump-start . And like the others, Cisco has teamed with Nvidia for GPU-accelerated compute power.

“We want to bring our customers technology and innovation in a way that’s easy to consume,” Todd Brannon, senior director of data center marketing, Cisco, said in a phone interview. “So with our C480 system, loaded up with Nvidia technology, we bring it to our customers as a UCS system, it self-integrates into their existing platform, they can apply all their existing policies and process to it, so they don’t need to stand up an island to adopt this new cutting edge technology. For the IT team, eliminating those siloes of policy and security is absolutely critical, …as is helping them take computing and put it at the right scale and the right locations.”

Cisco emphasized the value of the UCS C480’s integrated stack of AI capabilities, including options for CPU, network, storage, memory and OS, along with GPU acceleration, intended to selection of options to “right-size…each element of the AI/ML life cycle.” This includes data collection and analysis near the edge, data preparation and training in the data center core, and real-time AI inferencing.  The offering also includes Cisco Intersight for cloud-based systems management “for consistent and unified operations across the entire AI landscape,” on- and off-premises.

The new UCS server’s multi-cloud support allows data scientists to build and experiment with their models in the cloud, but train them in the on-prem data center, to meet data privacy and regulatory needs, without the need to re-architect an AI-ML project, according to Cisco.

Cisco UCS C480

“So if you were running those applications at the edge and doing the inferencing,” Brannon said, “we’ve got our smaller rack servers like the (UCS) C220 with GPUs in there, we’ve also created GPU nodes for our hyperconverged system, called Hyperflex, so customers can build clusters of hyperconverged machines and bring in GPU-specific nodes to tackle test-dev in a cloud environment. So you can have developers checking in and out of virtual machines that are GPU-accelerated to do their testing, and now the (UCS) C480 ML is where we can apply a lot of GPU backbone against machine learning, and the training and the C480 embraces the containerization of applications and multi-cloud computing models to facilitate deployment of open source software at scale.”

Cisco said it has been building an ecosystem of big data software partners, including Cloudera, Hortonworks and MapR, to develop a data pipeline running on UCS.

“We’ve expanded our ecosystem with folks...who come at this from the big data angle, and are now creating the interfaces into the machine learning stacks,” said Brannon. “We’ve also done quite a bit with Google, with their Kubeflow solution, which combines Kubernetes with Tensorflow, so it’s a containerized approach to the Tensorflow framework. And we’re supporting that initiative with contributions to the open source code, and then also giving our customers the ability to run it on-premises or in the cloud, so that’s a really unique thing. And for developers it means an ability to ‘develop once-deploy anywhere.'”

Brannon said that while data scientists and developers experiment with machine learning on a laptop, machine learning at scale requires a data pipeline capable of absorbing changing data sets, tools to collect, clean, and correlate data, and eventual use of the trained model on new data.

To aid in data pipeline development, Cisco recently published a Cisco Validated Design (CVD) with Cloudera Data Science Workbench, which integrates an existing big data CVD for Cloudera with deep learning frameworks, such as Tensorflow and PyTorch.

Brannon said underlying Cisco’s AI stack is easing the AI journeys of companies whose IT organizations are still maturing their AI expertise.

“The data scientist shortage is probably the gating element for most customers,” he said. “They don’t have a deep bench of talent that can build these kinds of applications. So with our solutions partners (Presidio, World Wide Technology, SkyMind, e+), they come in and through their practices around AI and machine learning, they can help our customers w these projects from end-to-end and help them deploy the technology we’ve developed here.”

Oracle, meanwhile, announced yesterday the general availability of virtual machine instances with Nvidia Tesla Volta V100 GPUs on the Oracle Cloud Infrastructure, a capability previously available only on a preview basis.

“These virtual machines join the bare-metal compute instance we launched earlier in the year and provide the entire server for very computationally intensive and accelerated workloads such as DNN training or run traditional HPC applications such as GROMACS or NAMD,” said Karan Batta of Oracle product management in a blog.

He said Oracle is making its Pascal generation GPU instances available on virtual machines in the company’s U.S. and Germany.

EnterpriseAI