Advanced Computing in the Age of AI | Friday, March 29, 2024

Consider a ‘Genetically Diverse’ Security Strategy 

Worldwide, the HPC market is booming as the industry enters the era of exascale-size data sets, and supercomputing applications expand beyond the academic and scientific world into financial services, life sciences, manufacturing, oil and gas, aerospace, defense, and more. Analysts predict the HPC market will reach more than $44 billion by 2020.

For many more industries, HPC – and supercomputing – are the future. And, while some industries, such as aerospace, have embraced the complexities of working with huge amounts of data to solve problems, organizations that have the data, but not the HPC savvy, are quickly falling behind.

Data rules HPC applications and commands special attention in protection and security. The large volumes of data in HPC applications make it an ideal candidate for enterprise storage, protecting data for decades as HPC organizations track and analyze data sets for years after inception.

Matt Starr of Spectra Logic

Matt Starr of Spectra Logic

Of the options for storing and protecting large-scale data, including public, private and hybrid cloud storage, hybrid cloud is probably the most efficient. It works well for small to mid-size HPC organizations, as well as organizations with small data sets and a large compute cycle.

HPC-ready organizations examining cloud storage should consider:

  1. Security - Public storage options face the greatest security threats, as the opportunity to randomly access data on the public cloud may be greater for larger data sets on a shared system. HPC organizations should encrypt all data before storing to the cloud.
  2. Cost - The general rule is: cheap to get in, moderate to stay and very expensive to leave. HPC organizations with exponential data growth should consider a private or hybrid model, keeping one copy of their data locally and one in the cloud.
  3. Data Value – HPC organizations should regularly ask: what is the value of our data? Can it be replaced if need be? Data that cannot be replicated or recreated has much higher value, and is better stored in a “genetically diverse” (more on this in a moment) storage system than entirely in the cloud.

In the event of a hack, organizations need to analyze the situation as they would a data set, to access and search their data for signs of tampering and determine if the hackers modified or deleted data. Data deletion hurts the most. With petabytes of data on hand, lost data means significant resources and time lost.

One way to mitigate risk to is employ genetic diversity in HPC data storage. The essence of genetic diversity comes from our grandmother’s wisdom: don’t keep all your eggs in one basket. It’s so simple, yet many organizations miss this important step.

Organizations should implement a hybrid cloud storage model, dispersing data across a mix of on-premises, private and public cloud storage to create a seamless and secure solution, customized to each organization.

For HPC, we recommend a three-part system, with one copy of data in the cloud, one on-site and one off-site. The combination of private cloud with the industry-accepted SHA-256 hashing algorithm, and disk and tape technologies ensures security and data preservation.

The next step: data validation over time to ensure continuity. Robust storage systems can confirm that data five years old has not changed, so it can be analyzed properly.

For those industries without HPC or proper data storage management, we recommend getting a move on. HPC is vital for growth, and great, positive impact awaits organizations that manage their data effectively.

Matt Starr is the chief technology officer of deep storage technology vendor Spectra Logic. Follow him @StarrFiles.

EnterpriseAI