Advanced Computing in the Age of AI | Friday, June 9, 2023

CIA Embraces Cloudera Data Hub 

As federal cloud provider Amazon Web Services (AWS) readies its commercial cloud service for the CIA, other vendors are lining up to supply the analytical and other tools needed to harness the enormous amounts of data to be stored and analyzed in the emerging spy and other federal clouds.

Among them is new CIA partner Cloudera, which is promoting an enterprise data hub platform that seeks to add a layer of "pervasive analytics" via a unified framework and data formats along with security and administration. Pervasive analytics would be used by federal agencies like the CIA to drive data from analysts to decision makers, Mike Olson, Cloudera's chief strategy officer, said at a company event in Tysons Corner, Va., this week.

Once the pieces are in place, the data platform is touted as replacing the data silos that have persisted in government agencies while solving security issues which remain a high priority for government cloud users. The result is that "everyone would be working off the same datasets," Olson explained, while a data governance capability would control who can view which subsets.

Along with AWS and its commercial cloud service, CIA Chief Information Officer Doug Wolfe, announced a partnership with Cloudera "to extend the innovation and push the envelope on a whole range of different solutions." Added Wolfe, "Having this enterprise data hub up in a month or so on our commercial cloud solution will make [it] a lot more accessible and easily usable by a broader range of the [CIA's analyst] population."

The partnerships with AWS and Cloudera also address the spy agency's requirement for cloud and big data innovation, Wolfe continued, "but that innovation needs to be scalable to serve a range of customers."

Cloudera investor Intel Corp. has also been working with the big data specialist on Hadoop security "from the silicon up," according to Steve Orrin, Intel Federal's chief technologist.

Other Cloudera partners are helping to implement its data hub by providing a range of components designed to securely store, access and analyze structured and, increasingly, mountains of unstructured data. For example, Cloudera partner Digital Reasoning is supplying a "trusted cognitive computing platform" that would leverage, for example, machine learning to bring structure to human language as an intelligence source.

Digital Reasoning CEO Tim Estes cited a current use case in which the company's platform was used in the banking industry to analyze up to 6 million emails a day to detect hints of insider trading. The company is working with Cloudera and business intelligence software vendor Tableau on a commercial cloud analytics service running on the data hub. Estes said one goal is "changing the aperture of observation" to shift the emphasis from IT administrators to data analysts to decision makers.

Cloudera is also working with storage specialist EMC Corp.'s Isilon scale-out networked attached storage unit to develop a data hub reference architecture. Audie Hittle, EMC Isilon's chief technology officer, said an emerging software-defined storage capability seeks to "redefine" collaboration and big data analytics. The reference architecture also would seek to move compute capability closer to data "in place" to improve data analytics.

Meanwhile, federal agencies are still trying to figure out how to move from an era of limited, structured data storage to relatively cheap and virtually unlimited storage of both structured and unstructured data. Cloudera's Olson noted that this would allow intelligence and other federal agencies to sweep up and store everything. It could then leverage emerging big data analytics tools to sift through sensor data and social media chatter on an as-needed basis.

Hence, pervasive analytics is emerging as a key federal cloud application as financial and other use cases are refined to help the feds shift their operations and infrastructure to the cloud. Hence, the data management component of the AWS commercial cloud service will require a data hub to help agencies get their arms around big data, Cloudera insists.

The scale of the big data task continues to grow, requiring potential government vendors to offer platforms that will scale along with the data explosion and whatever comes after Hadoop or Spark. Orrin of Intel said the partners are making Hadoop "enterprise worthy" under a security effort called Project Rhino. Not only will the project optimize Cloudera's Hadoop distribution on Intel's architecture, Orrin said, it will make it "enterprise scalable" for next generation architectures.

Intel has so far invested $740 million in Cloudera's Hadoop distribution, a total that represents the chipmaker's largest big data deal so far, Orrin noted.

Those future architectures will groan under the weight unstructured data generated by mobile devices and the explosion of open-source data. For example, the Obama administration claims its open data initiative alone has resulted in the release of more than 138,000 government datasets ranging from U.S. agricultural and trade statistics to climate and emergency preparedness information.

Olson cited another use case in which the commercial satellite imagery provider DigitalGlobe adds elevation data to its imagery used by insurers, for example, to gauge flood risks along coastlines. The military used to call this capability "data fusion," something the Pentagon has largely failed to develop on its own.

Now, the CIA, Defense Department and the rest of the federal bureaucracy are looking to commercial cloud and analytics vendors to customize their data platforms for public sector applications ranging from medical research to pinpointing terror cells.