SpaceCurve Adds Spatial World to Databases
As many enterprises grapple with the influx of big data they're collecting about customers' spending habits, credit histories, online preferences, and social media posts, a start-up expects its ability to provide even more information about the physical world will increase business insight and streamline processes, not add to the data chaos.
About six months since it launched a platform designed to spatially organize and analyze machine-generated data sources in real-time at extreme scales, SpaceCurve clients already are seeing results, founder J. Andrew Rogers told Enterprise Technology. That's because so many elements from the physical world can affect organizations' data, sales, and processes, but typical databases cannot incorporate traditional geospatial or geographic information systems (GIS) information within their formats, he said. Within databases, relationships must be explicit, just as they are in Facebook, for example, said Rogers.
"In the physical world there are no relationships between entities. There are proximity relationships. You can measure what the weather is like because you know where I am spatially. This is not just spatial data in the sense of making maps and GIS use cases or satellite use cases," he said. "When you post to Twitter, you are posting from a physical place in a physical time. In many ways, [SpaceCurve] looks like a database and feels like a database but it allows you to bring in every element – you can fuse it all together to this physically operational model of the physical world."
This becomes particularly important now the Internet of Things (IoT) is about to explode. By 2020, 25 billion things will be in use, up 410 percent from 4.9 billion this year, according to Gartner. As a result, IoT will support services spending of $263 billion in 2020 versus $69.5 billion in 2015, the research firm predicted. Already, organizations receive floods of data from GPS devices, mobile phones, and other sensors; that will become a tsunami once household appliances, vehicles, and thousands of other apps, products, and data sources are equipped with geospatial data. Savvy organizations must incorporate information from the physical world – weather, inter-personal relationships, traffic patterns– to add richer meaning to this data and discover new business opportunities or savings, said Rogers.
"Retailers have a really good idea of what goes on in their store. They know everything that goes on there. They don't know what their customers are doing when they're not at their store. Where do my customers come from before they come to my store? What neighborhoods do they reside in? Where do they go after they leave? What environmental factors change their actions? What changes behavior that I'm not aware of because it happens outside my store?" he asked.
Building a Platform
Unable to find a suitable database to meet its needs, SpaceCurve built its own by developing new algorithms and data structures that could handle the massive distribution of geospatial and sensor data models and parallelization of spatial operations. The database engine continuously indexes, then stores to disk millions of records per second from a vast array of sources, while performing interactive queries, said Andrews. The computational geometry engine analyzes relationships across time and space and data sources, he said.
"Building a new database engine that was able to handle the loads we expected to see was a Herculean effort, in and of itself. How do you scale out geospatial data models? If you look at Hadoop or any spatial data models, they use hashing tables or they use range partitioning," said Rogers. "The problem is we actually could prove you can't express geospatial data models at scale using either of those techniques so we had to solve this. The last step – and this was about how we were going to analyze the physical world – was if you look at the GIS systems or geospatial databases out there, they were really designed for cartographic systems, for making maps. We were looking for analytic use cases, where you're answering questions, where there may be no maps as the final product."
Since customers' data often resides in multiple silos, SpaceCurve needed to create a solution that would integrate with multiple databases and act as a common connector, he said. But it also had to scale to meet the anticipated terabytes or petabytes of data collected, Rogers added. Ultimately, SpaceCurve chose a SQL front-end in order to "seamlessly integrate" with traditional GIS systems, among others, since they usually use Oracle or SQL, he said. And it includes native support for interoperability standards including REST, JSON, and OGC.
Because most enterprises typically is incorporate SpaceCurve with their big data implementations, the system frequently connects to Hadoop, said Rogers, and it comes with a connector to Hadoop.
"A lot of times their data sits in 40 different silos around the company. It's often because their silos won't scale. Our role is to provide the common architecture for all the departments and subgroups so they have that one continuously updated model of the physical world. One department's usually responsible for one issue. If I need that information they'll do a data dump, but it's always a quarter behind. If it's updated in one silo, that's not reflected in another department's silo," he said.
Oil and gas companies have been early adopters of spatial data technology. In fact, one-third of companies in this vertical are investing in sensors, PwC found. These businesses monitor the physical locations of their pipelines, oil field overlays, and oil field telemetry, with data about mobile truck routes and drivers that they want to optimize. Weather plays a role, as do complex tax structures, said Rogers.
"These are all in different databases. To actually bring those all together to run reports or operational plans is a nightmare," he said. "A lot of time they don't do it because they can't get it done in a timeline that matters. To bring it all to one place where they can do it on an ad hoc basis brings a lot of value. That is millions of billions of dollars in lost efficiency for lots of industries."
Horizontally, at least one customer used SpaceCurve to conduct spatial analysis on distributed edge locations of their Internet infrastructures to improve website content aggregation and page rendering, according to the developer. By gaining insight into real-time spatial context across the optimization process and fused traffic, routing, machine, sensor, monitoring and performance data, and being able to query these fused datasets, this unnamed client fine-tuned its networks to enhance performance and end-users' experiences. Other enterprises save by no longer running many different Oracle databases, said Rogers. After cutting the licensing, maintenance, and server costs, departments can improve business intelligence and pull insight from existing assets, he said. Mobile telecommunications providers – which typically learn about network problems via outraged consumers' tweets – are seeking new business opportunities since they know the whereabouts of most of the population, information retailers may find insightful, Rogers said.
"Companies are just now trying to operationalize their GIS data. I was in Silicon Valley for 20 years. We're at the web circa 1994, 95, when only people who were really way out there were starting to see the cool things ahead. People are starting to get an inkling something interesting is going to happen here," he added.