Tom Siebel’s Unified AI-Big Data Front Against COVID-19
If AI is to help turn back the COVID-19 tide in a timeframe that saves lives and livelihoods, data scientists will need to build machine learning models fast and run them on a platform that scales to the global pandemic’s enormous complexities. Even more, vast amounts of Coronavirus-related data will be needed so models can be trained to produce valid epidemiological and treatment research results. And given the time pressures of this crisis, data scientists must be freed from the shackles of data wrangling and cleansing, which can consume 80 percent of their time.
But does such an AI development environment exist? And where can the data, in a cleansed and usable format, be found? Thomas Siebel, chairman and CEO of C3.ai, says his company has both, and he’s giving them away for free.
C3.ai, which boasts some of the world’s largest at-scale enterprise AI implementations (Con Edison, Enel, U.S. Air Force, Royal Dutch Shell), is leading a two-pronged effort to fight COVID-19. On the technology side, the C3.ai Data Transformation Institute is a consortium of research universities, supercomputing centers and Microsoft Azure that will pursue research projects, the results of which Siebel told us will be released into the public domain, using the C3.ai AI Suite and its model-driven architecture designed for enterprise-scale AI applications. On the data side, the company on April 13 will release for public use the first of two traunches of an aggregated COVID-19 data lake from more than 30 information sources, which will have been unified and federated using the AI Suite’s data wrangling capabilities, ready for use by data scientists.
The effort leverages experience gained by Siebel and his team not only at C3.ai but also from their years running Siebel Systems, a creator of CRM software formed in 1993 that became a $2 billion company and merged with Oracle in 2006. The platform that evolved into the C3 AI Suite, funded by Siebel when he formed the company in 2009, was eight years in development before becoming commercially available. C3.ai is now comprised of more than 500 employees and grew by roughly 100 percent last year, Siebel said.
“Some of these projects we’re getting involved in, like building these large-scale discrete event simulations, taking a massive amount of data and predicting what it will look like in seven days, OK, that's a hard process,” Siebel said of COVID-19 modeling workloads. “So the amount of data that you need to be able to aggregate, synthesize, and process – the number of CPU cycles that you need to be able to do that with acceptable levels of precision – this is a computationally extraordinarily extensive problem. In terms of scaling, it's mind numbing. When we start mapping these large genome sequence databases, you're going to run into scaling issues on data size and data processing capability, people are going to spool up, they're going to build machine learning models, that will take tens of thousands of virtual machines operating in parallel process.”
Delivering the most extreme of the compute intensive cycles will be the National Center for Supercomputer Applications (NCSA) at the University of Illinois/Urbana and its Blue Waters system, and the Perlmutter supercomputer, due for completion by spring 2021, at Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center.
The C3.ai-led effort is one of many resource sharing, crowdsourcing efforts formed to combat COVID-19. Data analytics and business intelligence specialist Tibco has released its COVID-19 Visual Analysis Hub, a site for using the company’s Spotfire analytics software to track the pandemic’s spread and impact based on data from the Center for Systems Science and Engineering at Johns Hopkins University (also used in C3.ai’s COVID-19 data lake) and other sources.
Among other efforts that emerged this week, Domino Data Lab, provider of an open data science platform, announced complimentary access to its data science platform to COVID-19 researchers. WellAI released a software application for COVID-19 researchers based on algorithms that read and summarize large amounts of medical literature, available at https://wellai.health. From China, Huawei Cloud announced as part of its Anti-COVID-19 Partner Program free access to cloud and AI services, such as its EIHealth that includes viral genome detection, antiviral drug in silico screening and AI-assisted CT patient screening service, as well as free cloud resources worth up to $30,000 (US). Trovares is making its xGT graph analytics tool, for in-memory computation capable of ingesting terabytes of data, available at no cost to data scientists working on COVID-19. (For other COVID-19-related analytics and data sources available free of charge, see “COVID-19 Spurs Offers for Free Software, Data, and Training” at sister publication Datanami.)
At C3.ai, Siebel said the Digital Transformation Institute (C3.ai DTI) has issued its first call for COVID-19 research proposals dealing with such challenges as slowing the pandemic’s spread, speeding development of medical treatments and designing and repurposing of drugs or clinical trials. DTI will initially fund more 26 research projects funded with more than $57 million from the company along with $310 million in the form of in-kind contributions from C3.ai and its C3 AI Suite and Microsoft Azure cloud resources. Winning proposals will be selected by June 1, Siebel said.
He’s optimistic that the project work, when released publicly, will quickly be accepted and adopted because “it's been blessed by Berkeley, Princeton, Carnegie Mellon so, I mean, the National Institutes of Health and the CDC are going like it.”
While compute and machine learning resources are valuable to projects of this type, Siebel said they’re not the most valuable.
“When you're dealing with AI at research institutions,” he said, “the scarcest resource isn’t computing capacity going into bioinformatics and it’s not human capital. It's availability of real data. So these data scientists and researchers, because they do not have access to large public health databases due to HIPAA regulations and what have you, they're forced to synthesize data.”
While the first traunch of the COVID-19 Data Lake will be released next week, the second will be released in May. The open data sets will be accessible at https://c3.ai/covid via utilities that support access through a RESTful API using common tools, such as Python, R, Ex Machina and Microsoft Power BI. C3.ai said researchers and developer are invited to help expand data lake by enhancing its functionality, developing analytics and predictive models and contributing additional data sets through a crowdsourcing model.
“We started working with NIH, the CDC and all of these research institutions to basically aggregate the largest unified, federated data image that consists of all the data that we're able to find on COVID-19,” said Siebel, adding that C3.ai partnered with Amazon Web Services on this aspect of the Coronavirus project. “And by a unified aggregate image…, it's not simply that all of these data are in one place, they're in one place and fully connected. This is an extremely large dataset where we've connected the articles on the disease to the patient who has the disease to the CT scan that indicates the disease. All of these pointers are there in a unified data image that we can navigate using a knowledge graph…and perform data science.”
Siebel said 50 C3.ai employees have been assigned to COVID-19 project work.
“In many ways, I think this crisis is a test,” Siebel said. “It's a test of us as individuals and how we behave. It's a test of the strength of our social fabric and how well it holds up under crisis, (because) it might get pretty tense out there in the next month. It’s a test of the strength of our government institutions, and at a less significant level,, it's going to be a test of the resilience of corporate leaders.”
“And you know, if we have some small impact at the edge of this crisis, I'll be honest with you, if this is all the company ever accomplishes, I'll be happy. If this is all we accomplish in the history of this company, I’ll feel the last 10 years will have been successful.”