Cassandra NoSQL Deployments Convincing Doubters
Apache Cassandra, the Java-based distributed NoSQL database, is making inroads in production despite lingering skepticism about its ability to handle mission critical deployments.
While Cassandra remains a second fiddle to NoSQL database market leader MongoDB, it has emerged as among "the best of the rest," according to Matt Aslett, research director at 451 Research.
Doubts about Cassandra's ability to handle production environments may be fading as the list of adopters grows. They range from Netflix and Apple to Credit Suisse and, at least in testing mode, the Dutch financial services firm ING. Still, "there is a learning curve," Aslett noted during a webinar this week.
A key selling point is the Cassandra's ability to serve as fault-tolerant platform for distributed datasets as well as enabling cross-datacenter replication. Still, Aslett added, "It is known for being operationally complex, and skepticism remains about NoSQL's use for mission-critical environments, partly due to its—eventual—consistency model."
Developed by Facebook and released to the open source community in 2008, Apache Cassandra was eventually commercialized by DataStax, which was founded in 2010. The basic DataStax version is said to differ little from the original Apache Cassandra.
The DataStax enterprise version includes new features like integrated search, analytics and in-memory processing capabilities, Aslett noted. Support for graph analytics is also in development.
Meanwhile, a growing list of enterprises has begun deploying both Apache Cassandra and the DataStax enterprise version in production. Most notable is Netflix, which used the NoSQL platform to deliver its subscription video-on-demand services. Elsewhere, Apple has deployed more than 75,000 Cassandra-based nodes handling an estimated 10 petabytes of data.
New application development projects include the Russian social networking site Ok.ru and an online game developer.
Despite skepticism about whether is up to the task, Aslett also noted several "forward-looking projects," including a Cassandra-based risk management platform being deployed by Credit Suisse. The analyst also noted that ING is currently testing a data-processing platform running on Cassandra that would operate across multiple datacenters.
For these and other applications, consistency of operations remains critical. Aslett noted that Cassandra is more compatible with other distributed NoSQL databases like Apache HBase and is also "tunable" as a way to improve consistency in mission critical deployments.
Cassandra is better suited to "timely business operations like machine learning and fraud detection," Aslett noted.
Cassandra's "master-less ring architecture" in which all nodes communicate with each other, thereby eliminating single points of failure, is also cited as a way it is being improved to boost continuous availability and uptime.
The downside risks, Aslett noted, are Cassandra's distributed architecture that brings with it complexity in fielding applications while placing a premium on IT management skills. Then there is the "baggage" associated with the Java runtime, he added.
Still, Cassandra's large and growing open source community can be leveraged when enterprises embark on application changes in the datacenter, the analyst said.
Meanwhile, search engine giant Google put the Cassandra NoSQL data store through its paces last year running on virtual machines on its Compute Engine cloud infrastructure. Google ramped up the number of Cassandra clients hitting 300 nodes to show the effect on latency and writes per second with the data store.
Along with Apache Cassandra and the enterprise version of the DataStax distribution, Azul Systems, a builder of Java virtual machines for Linux and x86 servers, is also touting its Zing platform as a way of achieving a "smooth, continuous data platform" for running a sometimes finicky Cassandra data store.