Cerebras’ AI Model Studio Promises to Ease AI Model Training Headaches
Training generative AI models is often expensive and fraught with challenges. These multi-billion parameter models demand months of training time on huge clusters of GPUs, directed by a team of highly skilled engineers, which can add up to variable costs in the multi-millions of dollars. Variable latency is also a struggle, as cloud providers cannot always guarantee speedy connections between large numbers of GPUs, causing complex distributed computing issues and large swings in time to train.
Cerebras Systems is looking to upend this unsustainable approach with new model training solutions that promise to democratize AI through better performance and predictable pricing. The company, along with deep learning infrastructure firm Cirrascale Cloud Services, has announced the availability of the Cerebras AI Model Studio. Cerebras says this new offering will allow customers to train generative Transformer (GPT)-class models on the company’s Wafer-Scale Clusters, including the newly unveiled Andromeda AI supercomputer.
The company claims its AI Model Studio’s “Pay by the Model” approach offers users the ability to train GPT-class models faster and at half the cost of traditional cloud providers (8x faster and half the price of AWS, for example, says Cirrascale) with only a few lines of code to begin. Customers can choose to train models from scratch with their own data, or they can fine tune open source GPT-class models spanning from 1.3 billion parameters to up to 175 billion parameters that are pre-loaded and ready to go.
"The new Cerebras AI Model Studio expands our partnership with Cirrascale and further democratizes AI by providing customers with access to multi-billion parameter NLP models on our powerful CS-2 clusters, with predictable, competitive model-as-a-service pricing,” said Andrew Feldman, CEO and co-founder of Cerebras Systems. “Our mission at Cerebras is to broaden access to deep learning and rapidly accelerate the performance of AI workloads. The Cerebras AI Model Studio makes this easy and dead simple – just load your dataset and run a script.”
Cerebras’s answer to bottlenecks in the current model training approach is more compute per device. The company’s Wafer Scale Engine AI chip, now in its second generation with the WSE-2, has 2.6 trillion transistors and 850,000 AI-optimized cores and is housed in the CS-2 AI accelerator system. By contrast, the current largest GPU is the Nvidia H100 which contains 80 billion transistors and up to 16,896 FP32 CUDA cores.
The Cerebras AI Model Studio gives users cloud access to the Cerebras Wafer-Scale Cluster where they can leverage up to a 16-node cluster of CS-2s with their WSE-2 chips to train models using longer sequence lengths of up to 50,000 tokens, according to the company. Cerebras also notes that the Cerebras Cloud @ Cirrascale is designed to enable flexible training and low-latency data center inference with its greater compute density, faster memory and what the company claims is a higher bandwidth interconnect than any other data center AI solution to deliver hundreds or thousands of times more performance than legacy options.
In a press briefing, PJ Go, CEO and co-founder of Cirrascale, noted that his customers, ranging from AI research labs to financial institutions, have all expressed a desire to train their own models using their own data to improve the accuracy of those models. “They want to do this without sharing their data, at speed, at a reasonable price, and most importantly, they want a predictable price. They don’t want to have to write a blank check to a cloud service provider to be able to train a model.”
The new Cerebras and Cirrascale offerings promise to help companies accomplish this goal. To that end, Cerebras has also announced a partnership with AI content platform Jasper to train its AI models using Cerebras resources, including the newly announced Andromeda AI supercomputer.
Jasper’s content creation products, based on generative AI models, are used by nearly 100,000 global paying customers for writing marketing and advertising copy, books, and other content. Jasper says it expects to dramatically advance its AI work with the training power of Andromeda, including training GPT networks to fit AI outputs to all levels of end-user complexity and granularity, which it says will improve contextual accuracy of these models and allow it to easily personalize content for multiple classes of customers.
“Our platform provides an AI co-pilot for creators and businesses to focus on the key elements of their story, not the mundane. The most important thing to us is the quality of outputs our users receive. We are hyper-focused on continuously adapting our AI models to meet our customers’ needs,” said Dave Rogenmoser, CEO of Jasper. “Partnering with Cerebras enables us to invent the future of generative AI, by doing things that are impractical or simply impossible with traditional infrastructure. Our collaboration with Cerebras accelerates the potential of generative AI, bringing its benefits to our rapidly growing customer base around the globe.”
Andromeda seems to be an exciting addition among these new resources. Unveiled at SC22, the AI supercomputer is a 13.5 million AI core system comprised of 16 CS-2 AI engines fed by 284 64-core AMD Epyc Gen 3 x86 processors. The system also has SwarmX switch fabric that links MemoryX parameter storage with the 16 CS-2s to provide 96.8 terabits of bandwidth. Andromeda is deployed in Santa Clara, Calif., in 16 racks at Colovore, an HPC data center. Cerebras says Andromeda can support all batch sizes through gradient accumulation, which is a deep learning technique that splits a batch of training samples into smaller batches to be run sequentially.
Jasper is only beginning its exploration into training these “GPU impossible” workloads with less time and energy with Cerebras resources including Andromeda, but the system has already garnered recognition. A multi-institutional team including researchers at Argonne National Laboratory recently won the ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research for work conducted on Andromeda. While training an LLM to track genetic mutations and predict variants of concern in SARS-CoV-2, the team found that training small models on Andromeda was faster than using 800 GPUs, and for larger workloads, the system completed work that a 2,000-node Nvidia A100 cluster was incapable of doing. In the publication of their award-winning work, the authors wrote: “We note that for the larger model sizes (2.5B and 25B), training on the 10,240 sequence length data was infeasible on GPU-clusters due to out-of-memory errors during attention computation. To enable training of the larger models on the full sequence length (10,240 tokens), we leveraged AI-hardware accelerators such as Cerebras CS-2, both in a stand-alone mode and as an inter-connected cluster.”
Andromeda is now available for commercial customers as well as for academics and graduate students. The Cerebras AI Model Studio is also available now.