Advanced Computing in the Age of AI | Thursday, March 28, 2024

Google, Mesosphere Make A Google Clone For The Masses 

Creating Google-like hyperscale infrastructure is going to be a whole lot easier thanks to a development partnership between the search engine giant and Mesosphere, an upstart application platform and cluster management vendor whose Mesos tool is inspired by Google's homegrown Borg and Omega platforms and is used at Twitter, Airbnb, and a slew of other Internet startups.

Mesosphere came out of stealth mode back in June, and the Mesos cluster management tool has been backed by funds from Twitter and Airbnb, which did a lot of legwork to take Mesos from an idea incubating at the AMPLab at the University of California at Berkeley (which has funding from Google) to something that could be deployed in production. The company also has venture backers, including $10 million in Series A funding from Andreessen Horowitz and seed money from the same firm as well as Kleiner Perkins, Foundation Capital, and SV Angel. All told, Mesos has had over $20 million pumped into it, and has become an Apache project, too. It may end up being the Linux for clustered applications years into the future, much as OpenStack could be as it adds platform services.

In July, Google announced that a number of big backers, including Mesosphere as well as Microsoft, IBM, Red Hat, Docker, CoreOS, and SaltStack, had agreed to contribute to the open source Kubernetes project that Google started earlier this year to allow customers of its Google Compute Engine public cloud to manage Docker software containers on that cloud. Kubernetes is not a cut-down or open source version of Google in-house tools, any more than Mesos is truly based on Google's Borg or Omega systems that allow it to run workloads on thousands of nodes across its fleet of more than 1 million servers.

The relationship between Mesos and Kubernetes was not made entirely clear back in July, and in a guest post on the Google Cloud Platform blog, Mesosphere co-founder Florian Leibert, who worked on Mesos at Twitter and Airbnb, tried to clear up the situation and to also explain how the pieces of software will fit together.

The first thing that Google and Mesosphere are doing is bringing Mesos to the Google Compute Engine public cloud.

The obvious question to ask is: Why on earth would Google need an application framework and job scheduling program for its public cloud when it already has what is probably the most sophisticated software in the world for doing these tasks already running atop the servers that comprise its search engine, ad serving, and other systems?

The answer is that Google doesn't want to expose its Borg or Omega platform – and they are really platforms as much as they are management tools – to the outside world. Borg and Omega are designed to run at Google's scale and to specifically run Google's workloads, and customers buying capacity on the Compute Engine public cloud are very likely not going to need the kind of scale Google itself does. Moreover, Google also knows that to compete against Amazon Web Services in the public cloud, it is going to have to foster tools that can be deployed in private clouds that are compatible with whatever Google does in the public cloud. Microsoft understands this, and Amazon Web Services just simply does not believe in the private cloud, and if a customer pushes hard enough, they are encouraged to use the Eucalyptus cloud controller (which has lots of AWS APIs built in) or, if the customer is large enough like the US Central Intelligence Agency and willing to spend $600 million, Amazon will build a private version of its public cloud.

Mesos is inspired by the Borg and Omega systems, but it certainly is not based on them. Ditto for Kubernetes, which is more like an application framework for Docker containers. With these two components, raw infrastructure on Compute Engine can be transformed into something that acts like Google's internal infrastructure, which has resource containers based on Linux cgroups and namespaces and which allows for workloads to be fired up, run, and shut down across the global Googleplex. The important thing about how Google's internal applications work and the way that Kubernetes and Mesos are architected is explained by Matt Trifiro, who was chief marketing officer at platform cloud Heroku and is now senior vice president in charge of marketing, thus:

"The best way to think of Kubernetes is as a framework for writing applications that are structured for Google-scale, by being composed of many microservices that can be independently scaled and interconnnected with service discovery (how one service finds another), replication (how a service grows its instances), and load balancing (how your route requests among those instances). Once you have a Kubernetes application, you need a platform on which to run it. Google has Omega and Borg to run its containerized applications, but they are inextricably tied to Google's infrastructure and not something that could be open sourced. Therefore, in this world outside of Google's internal infrastructure you need a platform on which to run these applications that – if you intend to use it for critical apps at scale – needs to provide a lot of the same underlying services that Google's own proprietary systems provide. And that is Mesosphere, which provides that underlying software substrate on which you can run Kubernetes applications at scale, across an entire datacenter at scale, in a highly available environment."

To make this happen, Google and Mesosphere are enhancing Kubernetes to have APIs that allow it to hook into the job scheduling portions of Mesos, and indeed other scale-out cluster management systems should they come along. The Kubernetes code will remain in the Go programming language and run as a framework on top of Mesos, thus:

google-kubernetes-mesos

This framework approach is exactly the same way that Marathon, what Mesosphere calls a framework scheduler and which amounts to a wrapper of sorts to go around standalone clustered Linux applications that gives them the resilience, fault tolerance, and scalability of the Mesos platform. Mesos supports a smorgasbord of platform, analytics, batch scheduling, and data storage frameworks, as we explained when Mesosphere came out of stealth. Now the Kubernetes Docker framework will be added to the mix. Mesos knows how to run various workloads and their frameworks side-by-side on the same cluster, meeting service level objectives, and that is the secret to success right there. This is what makes Google able to do what it does on 1 million servers instead of 2 million.

mesos-frameworks

Mesos already supports Docker containers, so that part was easy, but now it will be able to do it in a better fashion with the adoption of the Kubernetes. The Kubernetes concept of pods – groups of tasks – will be launched on top of Mesos, and the pod/label constructs will be available for all frameworks, not just Kubernetes.

The combination of Kubernetes and Mesos is not restricted to Google Compute Engine, and will in fact be compatible with other public clouds and also deployable on private clouds. Many of the early adopters of Mesos are running it atop Amazon Web Services to abstract and control the underlying virtual machines, storage, and networking in a manner that is more sophisticated than is possible using Amazon's own tools. Google is getting out in front by offering a starter Mesos/Kubernetes configuration for developers with four virtual servers, which costs 56 cents per hour according to the blog post, as well as a high-available cluster aimed at load testing and production use that has 18 virtual servers and costs $2.52 per hour. These are the base prices for Compute Engine, and the Mesos feature for managing the cluster is available for free, according to Leibert, who adds that Mesos will soon be part of the Google Cloud Platform dashboard to make it easier to click and deploy a Mesos-driven cluster.

The question now is this: Will Kubernetes plus Mesos be good enough to foster an upstart that can take on Google itself? Whatever Mesos can do, you can bet that for the workloads Google needs to remain the search engine and advertising giant that it is, Omega can do better and scale further.

EnterpriseAI