Advanced Computing in the Age of AI | Wednesday, July 24, 2024

CLIMB-ing higher in the UK with an HPC cloud for microbiologists 
Sponsored Content by Dell EMC | Intel

In the United Kingdom, microbiologists share world-class HPC resources via the groundbreaking CLIMB project — Cloud Infrastructure for Microbial Bioinformatics.

Thanks to dramatic advances in science and technology and gains in high performance computing, genome sequencing has moved into the mainstream of healthcare and scientific research. People around the world can now use genome sequencing to diagnose and treat diseases and to develop new therapies for cancer, Alzheimer’s and other diseases.

Gene sequencing is particularly critical to researchers working in the domain of medical microbiology. And this is a problem, because many microbiologists don’t have access to the computational infrastructure they need to carry out their data-intensive research, which often involves enormous genomics datasets.

This is where the UK’s Cloud Infrastructure for Microbial Bioinformatics (CLIMB) enters the picture. The CLIMB project, which is funded by the national Medical Research Council (MRC), is a collaboration between Warwick, Birmingham, Cardiff, Swansea, Bath and Leicester universities and the Quadram Institute Bioscience. CLIMB is dedicated to developing and deploying world-class cyber-infrastructure for microbial bioinformatics, including cloud-based compute, storage and analysis tools for academic microbiologists across the UK.

CLIMB has become an essential national capability for microbiologists in the UK. A recent count found that it serves more than 1,000 users and over 300 research groups from 89 research institutions, including universities, public health agencies and governmental organizations. In addition, CLIMB has provided training in bioinformatics to thousands of academics, students and clinical microbiologists across the UK and as far afield as Palestine, Gambia and Vietnam.

The impact of CLIMB hasn’t gone unnoticed outside the UK.  The project has won international recognition, including the 2017 HPCwire Readers’ Choice Awards for Best Use of HPC in Life Sciences and Best HPC Collaboration in Academia, Government or Industry.[1]

Let’s take a step back and view things from a broader level. The CLIMB project is part of a trend toward making HPC resources available via cloud interfaces. Systems that once might have been locked up in university and industry research labs and made available to only a select few are now being made available to many users.

This is absolutely the case with CLIMB. As explained in a CLIMB paper in the journal Microbial Genomics, the  CLIMB system was designed from the ground up to serve as a cloud-based computing infrastructure that provides an environment where microbiologists can share and reuse methods and data, and do it all without thinking much about the underlying HPC system.

“The cloud-computing approach incorporates a shared online computational infrastructure, which spares the end user from worrying about technical issues such as the installation, maintenance and, even, the location of physical computing resources, together with other potentially troubling issues such as systems administration, data sharing, scalability, security and backup,” the paper notes.

A look under the hood

The core infrastructure for CLIMB is a cloud system running the open source OpenStack operating system. To enhance resiliency, CLIMB is spread over four sites, each with 500 TB of local scratch storage.

At the heart of the CLIMB environment is a large shared object storage system that provides about 2.5 petabytes of HPC data storage, which can be replicated between sites. This storage system is based on Red Hat Ceph Storage running on Dell EMC PowerEdge servers with Intel® Xeon® processors. This community system provides a place where researchers can store and share very large microbial datasets.

In addition, the CLIMB cloud environment offers access to a huge amount of memory — more than 78 terabytes of RAM. With all this muscle under the hood, CLIMB can run more than 1,000  virtual machines simultaneously, and each of these VMs can be preloaded with software, customized by end users and saved as snapshots for reuse by others on the infrastructure.

Key takeaways

The CLIMB project is a great example of the future of high performance computing, in which resources will be virtualized and made available to many users via cloud services.

In this new world, users who need access to HPC resources for compute- and data-intensive work will look to HPC and AI as a service for everything they need. HPC shops, in turn, will act as multi-cloud service providers who provide centralized compute resources with multiple storage systems and access to multiple internal and external clouds.

This new era will continue the ongoing democratization of HPC by making high-end processing power and scalable storage available to all sizes of businesses, including startup companies, as well as traditional HPC power users in university environments.

To learn  more

For a closer look at the Red Hat Ceph Storage environment running on Dell EMC PowerEdge servers in the CLIMB environment, see the case study “CLIMB Project Supports Research Collaboration With Red Hat Ceph Storage.”

[1] HPCwire, “HPCwire Readers’ and Editors’ Choice Awards,” November 14, 2017.