A3Cube Forges Fortissimo Storage From PCI Express Fabric
Switching upstart A3Cube has created the first commercialized application of its PCI Express switching technology and the Ronniee Express fabric that runs atop it, aiming squarely at accelerating the performance of storage systems on a variety of extreme scale workloads.
The new software, called Fortissimo Foundation, is in effect a storage operating system that runs atop the Ronniee Express fabric that debuted earlier this year. The Fortissimo stack can support any POSIX-compliant file system, Emilio Billi, co-founder and CTO at A3Cube, explains to EnterpriseTech. The company has implemented the open source variant of Oracle's Zettabyte File System (ZFS) running atop the Ronnie Express fabric but it could just as easily use the ext4 file system that comes standard with Linux. The important thing is that the Fortissimo Foundation software can take advantage of the low latency provided by the PCI-Express switching fabric that A3Cube has created, which is enabled by layering on Direct Memory Access (DMA) and distributed non-transparent bridging (NTB) technologies and embedding them in the Ronnie server cards, which are a combination of a network interface and PCI Express switch.
The Fortissimo Foundation software aims to deal with the scale-out issue with storage, explains Billi. While it is easy to scale storage capacity using parallel technologies, it is not so easy to scale performance. You can accelerate the storage performance inside of a single server node by adding flash, in the form of NVRAM, PCI-Express cards, or solid state drives, but once you have a scale-out storage system, the latencies between the nodes in the storage cluster negate any benefits of acceleration on any particular node. Parallel and clustered file systems have metadata servers that keep track of where files are located on the cluster, and these need to be accessed for all file movements and updates and quickly become a bottleneck.
So A3Cube has created a distributed metadata layer that runs across the Ronniee Express fabric, which can be stored on SSDs or on main memory in the server nodes. This distributed metadata is implemented in a way that is similar to, but not identical to, the cache coherency that is used on NUMA and SMP systems, and it uses a broadcast engine inside of the Ronniee Express NIC/switch cards to keep the distributed hash tables that store the metadata in synch. Every node in the storage cluster does not have to store portions of this distributed metadata; for many workloads, four nodes can share a single portion of the table that is stored on SSDs or located in the main memory of the node (depending on how fast you want the metadata serving to be).
The idea is to use low-latency, DMA access across the PCI-Express fabric to allow main memory, flash, and disk storage across the cluster to be accessed as if it were all local on each node in the storage cluster. The Fortissimo Foundation software can also take a portion of the main memory on each node and create a scale-out virtual disk in that main memory that spans all of the nodes. This main memory disk is used to cache the data stored on much slower flash cards and SSDs, or if the datasets are small enough – on the order of terabytes – then then all of the data can be sucked into memory with the SSDs just being used as a persistent storage for it. In effect, the Ronnie Express fabric and Fortissimo Foundation combination can create an in-memory database (if you are using relational databases) or data store (if you are using NoSQL products) without having to do anything special. This clustered main memory pool looks like a giant, fast disk drive as far as external servers running applications are concerned, and it is mounted in precisely that way. The databases and data stores do not know they are not running on disks and are running on a pool of clustered main memory, says Billi.
In addition to the scale-out capabilities, the Fortissimo Foundation stack includes inline data deduplication and data compression, which is done inside the storage node memory.
Here are a few examples of how the Fortissimo Foundation software might be deployed and the benefits it brings.
On a Hadoop data analytics setup, the MapReduce applications remain the same and connect to a virtual Hadoop Distributed File System (HDFS), which is actually that modified version of ZFS outlined above. Data is spread around the cluster and replicated, just like Hadoop expects, but with the main memory and flash acceleration, Hadoop queries can run up to 100X faster without resorting to add-on software for Hadoop. And, the NameNode bottleneck with Hadoop is also eliminated, getting rid of a single point of failure. "You can create in-memory Hadoop acceleration without changing the Hadoop software," says Billi.
For NoSQL data stores like CouchDB, Cassandra, Membase, MongoDB, Redis, and Riak, the Fortissimo software allows the same 100X speedup for datasets that fit in main memory, and on workloads that need a mix of memory, flash, and disk, you can get buy on a lot less hardware nonetheless using the Ronnie Express shared fabric. As an example, a traditional disk-based cluster running MongoDB would need around 80 servers to be able to handle 1 million transactions per second. Billi says that a Fortissimo Foundation setup putting data in memory and on SSDs can handle the same 1 million transactions per second on five – that's right, five – nodes.
The acceleration also works with I/O-intensive applications that use the Message Passing Interface (MPI) protocol, as is common on parallel supercomputers, and on various kinds of multimedia streaming and processing applications. In these cases, the compute and the storage are running on the same nodes in a "hyperconverged" fashion, as the current lingo calls it.
For in-memory databases, Billi says that all the same benefits apply and, importantly, do not have some of the limitations of technologies like SAP HANA. "Normally, in-memory databases are limited in terms of footprint. They use the local memory in a node and then partition the data when they want to scale. Our system aggregates the memory in a file system, so you don't have to use the in-memory versions of the database from Oracle or SAP. You completely bypass the operating system stack to provide the memory aggregation to achieve at least ten times lower latency between the application and the memory."
In all of these use cases outlined above, customers do have to make one important change: They have to build a storage cluster based on the A3Cube hardware and software.
A3Cube has received its first order for the Fortissimo Foundation software, which will be deployed on a 48-node storage cluster with its Ronniee fabric cards in October at this undisclosed customer site. The company has another five proofs of concept underway using its PCI-Express switching as a backbone for a storage cluster. The Ronniee Express network interface cards cost under $1,000 a piece and the pricing on the Fortissimo Foundation software will run somewhere between $5,000 and $6,000 per storage node. There are no additional fees for server clients to access the resulting clustered storage. The Fortissimo Foundation stack and the Ronnie drivers are currently supported on Red Hat Enterprise Linux and its CentOS clone, provided the Linux has a 2.6.30 kernel or higher.