Hyperscale Titans Team To Scale MySQL
Four of the titans of hyperscale Web applications – Google, Facebook, LinkedIn, and Twitter – have teamed up to create a set of common extensions aimed specifically at running the open source MySQL relational database at scale.
The effort, called WebScaleSQL, is leveraging the open source MySQL 5.6 database that is controlled by Oracle, which got it from its acquisition of Sun Microsystems a few years back. Steaphan Greene, a software engineer at Facebook, announced the effort in a blog post. Facebook spoke for the group because the social network has one of the largest MySQL installations in the world – and one that is accelerated by PCI-Express flash cards.
"Our goal in launching WebScaleSQL is to enable the scale-oriented members of the MySQL community to work more closely together in order to prioritize the aspects that are most important to us," explained Greene. "We aim to create a more integrated system of knowledge-sharing to help companies leverage the great features already found in MySQL 5.6, while building and adding more features that are specific to deployments in large scale environments. In the last few months, engineers from all four companies have contributed code and provided feedback to each other to develop a new, more unified, and more collaborative branch of MySQL."
The important thing is that this is a branch of the Oracle MySQL 5.6 relational database management system, and all changes that are made to the code are meant to be compatible with it and available as upstream, open source code. Code changes have to be made by one engineer at one company and approved by another to be put into the WebScaleSQL bucket. The four companies are allows to make any other changes to the MySQL code as they see fit, of course, for their particular installations. Over the past several months, the four companies have come up with an automated test framework that will run code changes against the built-in test system for MySQL. They have also created a new suite of stress tests that are appropriate for large scale-out database clusters and have created a prototype automated performance testing system to see the effect of their code changes.
Among the different code changes that Facebook, Google, Twitter, and LinkedIn have made include features to turn MySQL into a read-only database, a means of flushing buffers and caches at startup, interleaving memory on all of the processors in a NUMA system underneath the database, and having sub-second timeouts on database clients. The tweaks include a slew of query optimizations, too, and you can see all of the elements of WebScaleSQL 5.6 at this link on GitHub.
The four founders of the WebScaleSQL effort are also looking for others who are pushing MySQL to the limits and who have the technical resources to contribute to the effort. Such companies are being encouraged to join with them and help scale out MySQL.
Oracle is not exactly on friendly terms with the open source community, and it may seem puzzling why the WebScaleSQL members chose the MySQL 5.6 Community Edition as the basis for its branch rather than MariaDB, Percona Server, Drizzle, and other alternatives that are actually forks of the MySQL database. In many cases, these forks can run on Linux, Windows, Unix, and other platforms, but the WebScaleSQL branch will only be available on Linux because, for all practical purposes, all hyperscale datacenters except for Microsoft's Azure are underpinned by Linux. (Just as is the case for most supercomputers today, too.)
In any event, the consensus across Google, Facebook, LinkedIn, and Twitter was that MySQL 5.6 was the right choice for now because "it has the production-ready features we need to operate at scale, and the features planned for MySQL-5.7 seem like a fitting path forward for us," as they put it in the WebScaleSQL site FAQ. The organization will not be providing binary code, and not necessarily all of the tweaks they make for scaling up and scaling out MySQL will be made available, either. But the MySQL community will have access to the code the four do agree is important and that should be contributed. What the MySQL community does after that is up to its members.
Facebook talked a bit about what it is working on at the moment to add into WebScaleSQL, which will be distributed under the GNU General Public License 2. The remaining bits of compression that were not already in MySQL 5.6, which have been used in production at Facebook for some time, are getting woven into WebScaleSQL. So is a non-blocking, asynchronous client, which speeds up performance of the database because queries do not have to wait to connect to the database server or to send or retrieve information. This code is being reviewed by the three engineering teams at Google, Twitter, and LinkedIn and has also been used for months in production at Facebook. The table, user, and compression statistics counters created by the social media giant and used in production are also being swapped into the MySQL stack. Facebook has also created a logical read-ahead mechanism, also tested live on its site, that can speed up full table scans by as much as a factor of 10X.
The interesting effect of this WebScaleSQL effort might be to foster a community of experts who understand better how to scale MySQL, which is something, of course, that these titans all would love to have access to. Because MySQL is pervasive already and open source, it is the chosen alternative to proprietary database management systems for small to midrange Web applications. The database is less useful for scaling up on larger systems, which is where MariaDB and Percona have done a better job – albeit, with a fork of the code, which is something that many companies will not feel comfortable with. Until they see the bill for Oracle 12c, IBM DB2 10.5, or Microsoft SQL Server 2014, that is.