Advanced Computing in the Age of AI | Sunday, September 24, 2023

Public Cloud-based Compliance Storage: How to Do It Right 

Electronic disk storage became popular in the early 2000’s for archive and compliance storage. SEC Rule 17a-4 is the federal regulation that requires broker-dealers and other regulated companies to retain business-related communications. It requires that records be preserved in a non-rewriteable, non-erasable format. EMC Centera was the first electronic disk storage solution that satisfied Rule 17a-4 and went on to become a highly successful archive storage solution for EMC. Competing products emerged from NetApp, IBM, Hitachi, but Centera remains the market leader.

On-premises compliance storage incurs a substantial cost. To satisfy the rules for immutable storage, compliance storage requires a specialized feature for locking the data, a feature that is sold at a premium. Another cost is the requirement for multiple compliance storage arrays, located in multiple locations, to satisfy SEC rules for duplication of data. The financial services industry is the largest purchaser of compliance storage.

As a replacement for on-premises compliance storage, new public cloud-based storage solutions from Amazon, Microsoft, Google and others promise a lower cost alternative: cloud-based “cold” storage solutions designed to store large amounts of “low-touch” archive data and priced as low as $10 -20 per TB per month.

On-premises compliance storage vs. cloud storage is not an apples-to-apples comparison. But if you compare $150K (as the cost of a 4 TB on-premises compliance storage array) to the cost of 36 months of cloud-based compliance storage at $10 per TB per month, the cost difference is a magnitude of 100X. To determine your own potential economic benefit, it is recommended that you catalog your archive and compliance data and compare the cost of on-premises vs. cloud storage.

Beyond the price advantages of a cloud-based storage solution, you also eliminate the support, maintenance, power and cooling costs of on-premises compliance storage. Additional advantages include:

  • Compliance storage based in the cloud can be replicated easily to multiple locations to satisfy SEC rules for duplication of data. On-premises compliance storage requires the purchase of additional storage arrays and secondary data centers to house the arrays.
  • Cloud-based compliance storage can be scaled up and down easily at the touch of a button to meet capacity demands and it allows easy access to powerful compute resources for indexing and eDiscovery.

So what should you move first to the cloud? Low-touch, unstructured documents are a perfect use case for public cloud archiving. The cloud can store documents from inactive employees, financial records, HR records, legal data sets, log files, media files and just about any form of data that requires long-term preservation. Unstructured documents typically live on file shares and user desktops. By removing them from existing desktops and file share, you reduce the burden on your company backup solution.

SEC Rule 17a-4

For the financial services industry, compliance data can be stored in a public cloud assuming that certain SEC Rule 17a-4 requirements are met, including:

  • Records are preserved exclusively in a non-rewriteable, non-erasable format
  • Automatic verification of the quality and accuracy of the storage media recording process
  • Serialization of the original and, if applicable, duplicate units of storage media, and time date for the required period of retention the information placed on such electronic storage media
  • The ability to readily download indexes and records preserved on the electronic storage media to any medium

Compliance Storage Solution

To meet SEC requirements a Compliance Storage Solution (CSS) is necessary. This is an application layer that works in conjunction with the cloud storage vendor and provides a platform for financial services industry customers to preserve email, journal data, voicemail and IM data.

The CSS:

  • Ingests the data and verifies the quality and accuracy using unique record ID’s.
  • Catalogs record items based on record ID and timestamps to ensure accuracy and speedy retrieval.
  • Manages multiple copies of the data across geographically separate datacenters.
  • Provides a native eDiscovery application to retrieve or make any recorded data accessible via eDiscovery search.
  • Provides automated retention management to store data for a defined period, in a non-rewriteable, non-erasable format.
  • Ensures full data fidelity and defensible migration with item-level audit reporting.

Keep in mind this data is meant for compliance and eDiscovery. This means generally low frequency access and not for end user access. End user data is kept in cloud-based email applications and file shares, where frequent access is the norm.

Bob Spurzem is director of product management at Archive360.