Advanced Computing in the Age of AI | Saturday, May 11, 2024

Bright Cluster Manager Enhances Support for NVIDIA GPUs 

Bright Computing announced today general availability of enhanced support for NVIDIA GPU accelerators in Bright Cluster Manager. Enhancements include support for the NVIDIA CUDA Toolkit version 5.5, the most-recently released version of the NVIDIA CUDA parallel programming platform.

Bright Computing customers can easily provision, monitor and manage systems with NVIDIA GPU accelerators within cluster-management hierarchies. The fully integrated and comprehensive support in Bright Cluster Manager for NVIDIA GPU accelerators includes:

  • Packages for NVIDIA GPU accelerators that include: NVIDIA CUDA and graphics drivers, the NVIDIA CUDA Toolkit 5.5 (CUDA-GDB, NVIDIA Visual Profiler, math libraries, and utilities), and the Tesla Deployment Kit. NVIDIA drivers are matched by Bright to the running Linux kernel through recompilation at system boot time;
  • Metrics specific to NVIDIA GPU accelerators (fan and clock speeds, temperature, utilization, memory usage and errors, power limit, compute, operation and persistence modes) can be monitored visually and used as triggers for rule-based actions (e.g., alert a sysadmin regarding an exceeded temperature threshold) in Bright. Monitoring in the Bright solution is aligned with NVIDIA’s legacy GPUs, as well as GPU accelerators based on the current NVIDIA Kepler architecture;
  • Rapid-runtime as well as exhaustive and invasive health-checks based on nvidia-healthmon;
  • Configuration of GPU accelerator settings through the Bright command-line as well as graphical-user interfaces;
  • Programmability environments for applications utilizing NVIDIA CUDA, OpenCL or directives-based OpenACC. These environments include support for hybrid applications using MPI for distributed memory parallel computing on CPUs that offload certain computations to GPU accelerators;
  • Recognition of NVIDIA GPU accelerators as computational resources available through the workload management (WLM) system. Bright supports all popular open source and commercial WLMs;
  • Bright-mediated verification of the development and runtime NVIDIA CUDA and OpenCL environments based on scripts provided with the NVIDIA CUDA Toolkit;
  • Run-time access to multiple configurations for NVIDIA GPU accelerators (e.g., different versions of the CUDA Toolkit) through use of the environment modules tool in Bright;
  • Recognition of physical collections of NVIDIA GPU accelerators (e.g., multiple NVIDIA GPUs housed in a single chassis like the Dell C410x) as GPU Units. In Bright, these GPU Units are provisioned, monitored and managed as first-class, composite devices in the cluster-management hierarchy;
  • Utilization of NVIDIA GPU accelerators made available from service providers of clouds (e.g., Amazon Web Services). Using Bright, a cluster can be established in the cloud with NVIDIA GPU accelerators, or these same accelerators can be used to extend on-premise IT infrastructure into the cloud; as well as
  • Bright instructions for making exploratory use of the Multi-Process Service (MPS). New to NVIDIA CUDA 5.5, MPS allows multiple MPI processes using NVIDIA CUDA to run concurrently on a single GPU accelerator transparently to the MPI program.

 

EnterpriseAI