Advanced Computing in the Age of AI | Thursday, March 28, 2024

Object-based Storage and Cloud in the Life Sciences 
Sponsored Content by EMC

Life sciences organizations must routinely manage and provide access to very large datasets. The problem is the volumes of data just keep going up. This is placing new demands on storage solutions.

To put the issues into perspective, consider the work done in most labs and research centers today.

Advances in next-generation sequencing have greatly lowered the cost to produce a sequence allowing organizations of all sizes to perform many experiments in a month. It is not unusual for an organization to be managing workflows with terabytes of data for a single sequencing run and petabytes for a lab.

Growing use of light sheet microscopy results in the production of more than a TB of data in an hour for a single experiment. And as organizations move into translational and precision medicine application areas and use genomic analysis in research and clinical settings, there is increased use of whole genome sequencing (WGS) and RNA-seq. Such work can involve PBs of data that require fast analysis in order to make timely decisions about customized treatments and therapies.

Further complicating matters, the data generated in these application areas is of value to multiple groups within and between organizations. And as such, the data must be easily accessible at different times using a wide array of analysis routines running on a variety of computing and storage platforms.

Simply put, there is a great need to share large volumes of data. Unfortunately, much of the data is stored on multiple clouds. This makes it hard to manage, costly to administer, and can put obstacles in the way when researchers try to perform analysis.

How object-based storage can help

To provide access to large datasets, many organizations are turning to object-based storage.

Object storage systems handle data differently compared to the more common file and block storage approaches. With object storage, data and all of the associated metadata is bundled up as an object. The object is given an ID and is retrieved by an application using the object ID. Unlike files and file systems, objects are stored in a flat structure. Objects may be local or geographically separated.

With object storage, the flat access structure provides a good way to provide access to hugely scalable unstructured data. Object storage solutions offer the ability to aggregate storage into disparate grid storage structures that perform work traditionally done by single subsystems. Solutions typically provide load distribution capabilities and resiliencies far in excess of that available in a traditional storage environment.

However, there are some points to consider before using object storage. To start, protocol issues must be addressed. Most applications are written to use CIFS or NFS calls. Object storage uses simple calls through an HTTP-based REST application programming interface.

As a result, organizations must either convert applications to make calls to object storage or they can use a storage solution that incorporates a gateway or storage services that makes access to the cloud and object data transparent to user.

EMC as your technology partner

In today’s research environment, organizations need high-speed access to large datasets stored on file, block, and object-based systems. And there needs to be the flexibility to store the data on disparate, geographically dispersed clouds such as Amazon Web Services (AWS), Microsoft Azure, and others.

These are all areas where EMC can help. The EMC Emerging Technologies Division, home to EMC Isilon, ECS (Elastic Cloud Storage), and DSSD solutions, is a trusted life sciences IT advisor.

Complementing EMC’s widely used scale-out NAS storage solutions, ECS is a third generation object platform designed for next-gen applications and traditional workloads with unmatched storage efficiency, resiliency, and simplicity. It can be deployed as a turnkey storage appliance or software-only solution designed to run on industry-standard hardware.

On the cloud front, EMC Isilon Cloudpools enable organizations to tier data to a “cloud pool” with a cloud provider of choice or to an ECS system. Specifically, EMC Isilon Cloudpools lets an organization integrate its core storage infrastructure with a choice of public cloud providers including AWS, Microsoft Azure, Virtustream, and EMC ECS and Isilon private cloud options. Seamless integration is transparent to users and applications.

This allows organizations to gain new efficiencies and optimize resources by embracing the cloud as an archiving storage tier for cold or frozen data. They can move old, unused data to the cloud to free up space on their primary systems used for higher value data and applications.

This in turn lets organizations optimize storage resources, gain cloud-scale storage capacity, and reduce overall storage costs. It also lets organizations maintain necessary working space on production performance and active near-line archive tiers, buys time to work through IT procurement cycles, delivers agreed upon customer SLAs, and creates new, cost-effective opportunities to store research data long term.

EMC life sciences customers include some of the most leading edge research organizations in the world. They include Partners Healthcare, The Broad Institute, the National Cancer Institute (NCI), Cancer Research UK, the German Cancer Research Center (DKFZ), and others. Many of these organizations are dealing with growing volumes of data that keep growing, while the time to perform whole genome analysis must be reduced.

Visit Emerging Technologies for the Life Sciences online to learn more!
http://www.emergingtechsolutions.com/life-science?utm_campaign=ETD%20Life%20Sciences&utm_medium=article&utm_source=CenterStage%20Article%20Tabor

EnterpriseAI