Advanced Computing in the Age of AI | Saturday, April 20, 2024

Google Compute Engine Gears Up to Take On AWS 

Search engine juggernaut Google is taking the beta label off its Compute Engine cloud infrastructure services, taking on Amazon Web Services, Microsoft Windows Azure, IBM SoftLayer, and other public clouds.

Economy of scale is what makes it possible for cloud providers to have a viable business and compete against the millions of datacenters and data closets around the globe. For the longest time, Google took the same attitude that Microsoft did about cloud computing, essentially saying that what customers really needed were platform-level services such as Web serving, database serving, or application runtimes. Windows Azure and Google App Engine were designed to have one layer of abstraction above and beyond the infrastructure services sold by AWS, such as its EC2 compute and S3 storage. But in the summer of 2012, both Google and Microsoft bowed to the demands of the market and said they would offer infrastructure as well as platform services.

It has taken Google a long time to move Compute Engine from science project to beta program to general availability. Compute Engine made its debut as a limited product back in June 2012, a week after Microsoft added virtual server slices and chunks of storage for its Azure cloud. Amazon had launched the EC2 compute cloud more than six years earlier, of course, and that is one of the reasons – but certainly not the only one – that AWS is by far the largest provider of both infrastructure and platform cloud services today. While Amazon has first-mover advantage, Google has a very profitable near-monopoly in search advertising (which generates immense profits) and an equally good set of skills in building efficient and homegrown servers and datacenters. Google may be smaller than AWS by a factor of five, but to be fair its Compute Engine infrastructure services were still sort of in beta. Even with that, Google's combined platform and infrastructure cloud businesses are growing nearly twice as fast as the market and considerably faster than AWS.

With Compute Engine being generally available, now we get to see what Google is really capable of doing in terms of competing on price, availability, and other metrics against AWS and Microsoft. The latter company also has deep pockets, is spending heavily on infrastructure and datacenters, and understands that the future of its software business is out on the public cloud as much as in the private datacenter.

Compute Engine uses Google's own homegrown virtualization technology, which it has never divulged the source of, and started out providing Ubuntu, Debian, and CentOS slices running on VMs in its datacenters in the United States. Google's limited preview provided Cloud Storage for both ephemeral and persistent storage for these slices. In May of this year, Compute Engine went into beta testing, and Google did a number of things. First, it boosted the persistent storage slices from 1 TB to 10 TB because customers said they needed more capacity, and second, it shifted to per-minute pricing, bucking the per-hour trend out there in the clouds.

With the production-grade Compute Engine, there is a service-level agreement and proper tech support in place, which makes it a safer and more predictable place in which to land applications and data. To be specific, Google is offering 24x7 tech support and a 99.95 percent uptime guarantee for the server slices. (That is uptime per month.) If Google falls below this threshold, then customers are entitled to cash back on their monthly bills along a sliding scale, ranging from 10 to 50 percent of the bill. To help mitigate downtime issues, Google has done two things.

First, it has tweaked its homegrown hypervisor to allow for the live migration of virtual machines on its infrastructure. Live migration has been around for years on X86 servers as well as on RISC and Itanium processors, but it is not available on other public clouds yet. Microsoft could certainly offer it, since the underlying Hyper-V hypervisor it uses supports live migration, but has not yet enabled this on its Azure cloud. AWS can no doubt add live migration to its own homegrown hypervisor, and for all we know it has long since done so for its own internal purposes and has not yet exposed it to its cloud customers. Any company building a cloud with KVM or Xen hypervisors can similarly offer live migration. In the event of hardware or maintenance issues, running VMs on Compute Engine can be moved elsewhere inside a zone of a datacenter facility. Google has also added automatic restart for virtual machines and their software, so in the event of a hardware crash, VMs will automagically come back online. This, says Greg DeMichillie, director of product management for public cloud services at Google, was a requirement of enterprise customers running applications on Compute Engine.

Google also has introduced a practice it calls transparent maintenance, which uses live migration to move running VMs around a zone so Google can take machines offline when it needs to. At any given time, software patching, system testing, and other preventive maintenance tasks take down a small percentage of machines in any datacenter zone. Every so often, a whole zone is taken out of action for around two weeks for serious upgrades, and VMs also need to be moved then. Transparent maintenance is available in Google's datacenters in the United States now and will be rolled out in Europe shortly.

Compute Engine offered instances with 1, 2, 4, and 8 cores since earlier this year, and with the production launch, Google is offering a 16-core instances in standard, high memory, and high compute types. (You can see all of the instance types here.) Google is using Intel's "Sandy Bridge" Xeon E5 processors for its Compute Engine instances, and has pegged a single core in these machines at 2.75 relative compute units. The server slices can have up to 16 persistent disks and up to 10 TB of capacity. Standard instances have up to 30 GB of virtual memory, high memory instances range up to 104 GB, and high compute (and therefore low memory) instances go as high as 14.4 GB. The 16-core instances are in limited preview; Google has not said when they will be available to anyone who wants them, but anyone can ask for them starting today. Support for Red Hat Enterprise Linux and SUSE Linux Enterprise Server is in limited preview, and FreeBSD has also been added. You can also fire up your own Linux licenses and run them "out of the box" on Compute Engine. Google has not said when it will offer support for Windows Server.

As part of the general availability launch, Google says it cut the prices on its standard server slices by 10 percent, and added that it is deprecating the server instance types that were available during the beta. These had local disk storage, and the new slices do not.

Google, as noted above, is also getting rid of local scratch disks in its server instances and moving to external persistent storage, which it says is faster, cheaper, and more predictable in terms of both performance and pricing. This persistent disk storage has been tweaked to go into burst mode when operating systems are being loaded or software installed so these operations can be handled quickly. This persistent storage is striped across hundreds or thousands of physical devices, with RAID data protection at several levels to keep data from being lost and to boost performance. Google says it ensures that the performance is consistent, so one 1 TB volume yields the same performance as ten 100 GB volumes.

With the latest persistent disk, random read capacity is boosted from 250 I/O operations per second (IOPS) to 2,000, and random write performance is jacked up from 600 IOPS to 2,400. Streaming writes have the same sustained data rate of 120 MB/sec and streaming reads is boosted from 120 MB/sec to 180 MB/sec. Finally, the new persistent disk storage has a big price chop, from 10 cents per GB per month plus 10 cents per million I/O operations to a flat 4 cents per GB per month with no fees for I/O operations.

In addition to Compute Engine, Cloud Storage, and App Engine, Google also offers a number of platform-level services, including Cloud SQL, a relational database service' Cloud Datastore, a NoSQL service; BigQuery, an SQL query service that runs atop Google's BigTable database layer for its proprietary file system; and Cloud Endpoints, a service that just became available last month that allows developers to create APIs for App Engine and expose them to client devices like smartphones and tablets.

DeMichillie said in a video launch for Compute Engine that Google had over 1,000 engineers now dedicated to building out the Cloud Platform, as the collection of infrastructure and platform services is called. Just for fun, DeMichillie showed 1,000 virtual machines being fired up on Compute Engine, which took a mere 2 minutes and 41 seconds.

Comparing across the public clouds is difficult at best, and this is of course exactly what customers need to do. Amazon has more services and features than Google, Microsoft, Rackspace Hosting, IBM, Hewlett-Packard, and others are offering. There is no question about that. And there is no question that Amazon intends to keep the pressure on anyone who tries to take it on in the public cloud, much as it has been relentless in its online retail business.

EnterpriseAI