Graph Databases Gaining Enterprise-Ready Features
Graph database vendors are broadening their applications by adding enterprise-focused features to help customers who are dealing with the burdens of huge troves of business-critical data.
In a move that is highlighting the growing maturity of the graph database marketplace, Neo4j recently unveiled its latest product, Neo4j for Graph Data Science, which is designed to make it easier for enterprises to use graph machine learning to expand their capabilities. Another vendor, Katana Graph, recently announced a collaboration with Intel to port and optimize its Katana Graph engine on Intel Xeon scalable processors, Xeon-based clusters and on Intel’s upcoming discrete GPUs.
Meanwhile, TigerGraph, recently unveiled the results of a new graph data management benchmark study that uses nearly 5TB of raw data on a cluster of machines to show the performance benefits enterprises can potentially receive using its graph database.
Graph databases are purpose-built to store and navigate what are called data “relationships,” according to documentation from Amazon Web Services. Relationships in graph databases are critical to bring disparate data together, using “nodes” to store data entities, and “edges” to store relationships between entities. Edges in graph databases include a start node, an end node, a type, and a direction. An edge can describe parent-child relationships, actions, ownership, and more. The number and kind of relationships a node in a graph database can have is unlimited.
Graph databases continue to gain new interest from enterprises because they connect all of a company’s internal and external datasets and pipelines for analysis, which can then create and deliver broad business insights that might not have been possible in the past.
Neo4j for Graph Data Science, Built for Enterprises
The latest version 1.4 of Neo4j for Graph Data Science specifically targets enterprise customers with graph-native machine learning functionality that’s being made available for business use. These capabilities are important for enterprises, according to Neo4j, because organizations don't always know how to represent connected data for use in machine learning models. Version 1.4 includes graph embedding algorithms that learn the structure of a user's graph, rather than relying on predetermined formulas to calculate specific features like centrality scores. Using AI, the updated product calculates the shape of the surrounding network for each piece of data inside of a graph, enabling far better machine learning predictions, according to the vendor. It can make predictions for fraud detection, tracking customer or patient journeys, drug discovery research and more.
Katana Graph Collaborates with Intel
To help Katana Graph customers bolster their graph database analytics projects, Intel is collaborating with Katana Graph to port and optimize its Intel Xeon Scalable processors, Xeon-based clusters and its upcoming line of discrete GPUs, including the code-named “Ponte Vecchio.” The recent announcement aims to help customers exploit high-performance, scale-out parallel computing to solve large-scale problems with unstructured data, according to the companies.
“For deep analytics on large, unstructured data to scale into mainstream usage, it will need to be deployable and performant on both volume CPUs and GPUs, Wei Li, Intel’s vice president of architecture, graphics and software and general manager of machine learning and performance, said in a statement. “Our collaboration with Katana Graph will accelerate the adoption of graph analytics on market-leading Intel Xeon Scalable processors as well as our upcoming GPUs, enabling more customers to take advantage of graph computing.”
The Katana Graph engine can run on large clusters of x86 CPUs, large memory systems with Intel Optane persistent memory, single or multi-node GPU platforms, or any combination of these technologies, according to the company. It can also scale to hundreds of machines in production clusters.
TigerGraph Touts Benchmark Results of its Scalable Graph Database
Enterprise graph database vendor TigerGraph recently unveiled the results of performance benchmark tests conducted on representative enterprise uses of its scalable application. Touted by the company as a comprehensive graph data management benchmark study, the tests used almost 5TB of raw data on a cluster of machines to show the performance of TigerGraph. The study used the Linked Data Benchmark Council Social Network Benchmark (LDBC SNB), which is a reference standard for evaluating graph technology performance with intensive analytical and transactional workloads.
The results and performance numbers showed that graph databases can scale with real data, in real time, according to the vendor. TigerGraph claims it is the first industry vendor to report LDBC benchmark results at this scale. The data showed that TigerGraph can run deep-link OLAP queries on a graph of almost nine billion vertices (entities) and more than 60 billion edges (relationships), returning results in under a minute, according to the announcement.
TigerGraph’s performance was measured with the LDBC SNB Benchmark scale-factor 10K dataset (4.8TB raw data, 8.86B vertices, 61.77B edges,) on a distributed cluster for the analysis. The implementation used TigerGraph’s GSQL query language, which were compiled and loaded into the database as stored procedures. TigerGraph’s performance testing included three types of queries: IS Workload (all queries answered in one to three seconds), IC Workload (all queries answered in three to nine seconds) and BI Workload (the majority of OLAP-style iterative and/or deep-link graph queries were answered in under one minute). Each query was performed three times, with the median of the elapsed times being the final latency times. Each query was performed on clusters of 24, 18 and 12 machines.
Where Graph Databases Fit for Enterprises: Analyst
For enterprises that are working to extract value from large stores of critical business data, graph databases can be an important tool, Mike Matchett, principal analyst of research firm, Small World Big Data, told EnterpriseAI.
“The recent news from graph database suppliers highlights the emergence of high performance, highly scalable graph solutions,” said Matchett. “This market is not just offering high-quality graph-based databases with native, advanced graph analysis support, but also provides solutions that can tackle very large data sets very quickly.”
These advances are allowing businesses to apply graph-based approaches to huge scales of network analysis, such as deeper healthcare models for testing and tracking for the transmission of diseases, and other very large data-driven challenges, he said. Such queries were formerly limited to highly-structured (OLAP) or parallel query (Hadoop) kinds of analyses, he added.
“In particular, AI, machine learning and graph ‘theory’ have a large logical connection, particularly in neural networking, but it hasn’t been practical until now to build integrated technology solutions that are both scalable and performant over real large data sets,” said Matchett.
“Graph databases are fast evolving to scale to larger data sets and process graph-based analytics, faster than ever before,” he added. “Graph representations can provide a far more natural and flexible way to model just about any data set [compared to data structured in tables/records with a fixed key-based schema], while intelligent graph analytics can analyze and extract insights about relationships embedded in that data that are otherwise simply not practical to process.”
Graph Databases Haven’t Yet Caught Traction with Enterprises
Graph-based approaches, including graph databases, remain underappreciated and overlooked by many enterprises, said Matchett. “But with increasing support for high performance and high scale, more barriers are coming down that might impede a much wider adoption. We think the opportunity for enterprises to leverage existing graph technologies to solve many pressing problems, including problems that are otherwise unsolvable, is already huge. The real barrier to adoption is one of awareness and training in graph approaches outside of the greenfield, advanced-thinking teams on the cutting edge.”