Dell’s Latest Hat Trick: Making Big Data Small
When it comes to data, you might think that there is a correlation between the size of an organization and the amount of data it has to deal with. Big company, big data.
But this is not necessarily the case. As Dell senior consultant Amy Price points out, it really depends on how an organization is using its accumulated bits and bytes. A very large business may be able to get along just fine without generating mounds of data; on the other hand, there are many small to medium sized shops that are drowning in the stuff. "It's use case driven," she says.
SMMs (small- to medium-sized manufacturers) are particularly vulnerable to this tidal wave of information. Shop floor tool sensors, RFID technology disseminated throughout the supply chain, the trail left by goods as they move through the factory, web logs, ERP data, and procurement information are just a few of big data drivers in the mid-market manufacturing community. In addition, merger and acquisition activity has turned smaller companies into larger ones – in this case, there is more of a correlation between size and data generated due to manufacturing's propensity to generate huge quantities of information.
In fact, according to Dell Manufacturing Vice President Bill Popp, the manufacturing sector will generate more data than any other – the industry stored two exabytes of data in 2010, and that was two years ago. IT, he says, is moving closer to operations, the fountain from which the torrents of big data are flowing.
For SMM IT organizations, this poses both a problem and an opportunity. On one hand, these relatively small, usually overworked IT departments – sometimes consisting of only a few people – are attempting to cope with the inexorable increase in data being generated by the business and its partners. On the other hand, these growing terabytes of both structured and unstructured data are a gold mine that can be plumbed using today's powerful business information (BI) and analytic tools.
For Dell, the situation represents an opportunity as well. The company has introduced a new storage solution designed to help enterprises both large and small get a handle on big data and turn it to their advantage. And, according to Bill Popp, Dell's vice president of Manufacturing Sales, the new Dell Big Data Retention solution is of particular benefit to SMMs. "By providing a more efficient model to store big data, it allows these manufacturing companies to get the most out of their BI analytics by allowing the data to be properly retained and accessed," he says.
The solution's concept is simplicity itself – make the data smaller so it can more easily be stored and retrieved, and reduce costs in the bargain. It combines Dell storage, including the DX Object Storage platform and RainStor database technology. The cost of retaining big data is reduced through data reduction, simplified data management, and robust scalability.
The solution is integrated with existing analytics platforms, providing a frontend big data repository for large datasets, as well as a backend archive. It can also serve as a standalone repository or work with an analytics platform such as Hadoop, the popular open source software platform for scalable, distributed computing. Customers can start with simple SQL-style queries, scaling to more complex analytics powered by Dell PowerEdge C- and R-series servers, and leveraging Force 10 networks. The solution can work with existing data warehouses, providing the ability to offload data and store it less expensively.
Price says the product is geared toward handling structured or semi-structured data – for example, the information in a data warehouse or the data that has been extracted from a database and retained in XML format. The Dell solution allows the data to be compressed to about three percent of the original footprint; so instead of storing a petabyte of data, you wind up with a very manageable 30 terabytes. Despite the extreme compression, the data maintains its integrity, is searchable, discoverable and assessable, and can be managed more easily using the appropriate product lifecycle management system.
The RainStor database provides online data retention at a massive scale, with unlimited scalability and zero administration. It directly addresses the management of machine generated data (MGD) which accounts for many of the large datasets encountered by manufacturing organizations, including RFID generated data, network logs, shop floor sensor readings, building management systems, medical devices, etc.
For example, Popp notes, a supplier manufacturing parts for aircraft engines can gather manufacturing data on each part as it move through the entire product lifecycle as well as tracking the customers that are buying these parts. Once this mass of data is made easily accessible for analysis, the company can determine how many parts were generated over time, their costs, defect rates, and any manufacturing inefficiencies. This can provide a fine grain of quality control. In addition, the company now has a detailed profile of its customers' buying patterns that can be used to predict future sales and what resources should be deployed to meet the anticipated demand. In general, because the cost per terabyte of data stored is considerably lower, the company can store data for longer periods of time for future analysis and other business requirements.
Says Popp, "You can get very creative by looking at these data sources across multiple dimensions and using the structured and semi-structured data generated by RFID chips, web logs or other sources. You can answer questions that you couldn't even consider before. This is really what the big data trend is all about."