Advanced Computing in the Age of AI | Thursday, March 28, 2024

The Beginnings Of Hyperscale At Google 

Fifteen years ago, Google did what every startup on the Internet dreams of: It closed its first round of venture funding, put out its first press release, and took its first step into the hyperscale realm and made its first big order for custom servers to support its rapidly expanding search engine empire.

Like many Silicon Valley startups, Google got its start at Stanford University, and for a while it ran the original search engine, called BackRub, on a hodge-podge collection of machines. The main BackRub crawler and indexer was a Sun Microsystems Ultra II workstation with two 200 MHz UltraSparc-II processors equipped with 256 MB (not GB) of main memory. Given that Sun Microsystems was a Stanford spinoff itself, this seems like a perfectly appropriate place to start in 1996, when Google co-founders Sergey Brin and Larry Page started working on BackRub. Soon, this was not enough iron. Page and Brin got two dual-socket Pentium II machines as a donation from Intel, which had chips running at 300 MHz and 512 MB of main memory, which were used to run the search engine as it expanded. The two then cadged a four-socket RS/6000-F50 AIX machine from IBM to use as a storage array, and hung an outboard disk expansion boxes on the Unix machines. The whole shebang fit on and under an industrial-strength  folding table.

Again, like many startups, in fact.

google-first-servers

In the summer of 1998, the search engine is still growing exponentially, and Sun co-founder, serial entrepreneur, and Stanford PhD graduate Andy Bechtolsheim cuts Brin and Page a blank check for $100,000 as seed money. Google is incorporated in the fall, and word starts spreading that Google’s algorithms are much better than other search engines of the time. Early in 1999, Google moves out of the garage (quite literally) and Urs Hölzle, another Stanford grad, joins the company to start managing Google’s infrastructure. He is the eighth employee of the company and as senior vice president of technical infrastructure and Google Fellow, he has the same job – and probably the coolest job in the world, too.

The following summer, in July, is when Google went from curiosity to something that looked like it was on possibly going to catch fire, and it was then that the company placed its first big order of servers, giving it more than an order of magnitude of computing power. Hölzle reminisced about that big order on his Google+ hangout recently:

“At the time of the order, we had a grand total of 112 servers so 1,680 was a huge step,” he wrote. “But by the summer, these racks were running search for millions of users. In retrospect the design of the racks wasn't optimized for reliability and serviceability, but given that we only had two weeks to design them, and not much money to spend, things worked out fine.”

In that sentence is buried the heart of a company like Google, a company founded by engineers and with an engineer’s practicality in that something can always be improved upon but you need to get something done now to fix a big problem.

Interestingly, Google did not build the servers itself, but farmed the job out to King Star Computer, which was based in Santa Clara, down the road a spell from Google’s Palo Alto office at the time, and it had custom racks and custom servers that are collectively known as the Corkboard system because the motherboards were slapped down raw on corkboard, four to a shelf. The order consisted of 21 racks that each had 20 shelves.

google-corkboard-servers

Google crammed four single-socket Intel Pentium II server nodes onto each shelf, based on a Supermicro motherboard, each equipped with 256 MB of main memory, two 22 GB DeskStar disk drives from IBM, and an Intel 100 Mb/sec Ethernet card. The shelf had four custom power buttons to reboot the machines, and was open to the air for better cooling and because there was no need for vanity. (Google wears its tattoos on the inside, as it were.)

Here is what the Corkboard server rack looked like:

google-corkboard-rack

This Corkboard server rack is what could probably be called one of the first microserver systems in that it was using predominantly desktop components to do distributed server workloads. Google supplied the BIOS settings it wanted for the machines, and wanted the racks completely pre-assembled and ready to power on to accept its workloads. The order was placed by Larry Page on July 23, for delivery no later than September 15 of that year. Google slapped down a $210,000 deposit and agreed to pay $109,200 per rack. Even back then, Google thought in terms of racks. And it shelled out just under $2.3 million on infrastructure – a little less than 10 percent of its venture funding haul – on infrastructure. And very likely, it would probably have to place a similar order as the Google search engine rode up the hockey stick curve.

When asked in the Google chat session about how many servers Google has today, Hölzle was coy, as all of the company’s executives are about such things:

“For reference, we have more than an order of magnitude more now 🙂 What's more amazing though is that a very small number of current servers (3-4, but that's just a guess) have as much compute and storage as these 21 racks, so we could serve 1999's traffic on just these 3-4 servers.”

It is generally believed that Google has a global server fleet of well over 1 million machines, but that little sentence above provides a clue to what the size of the fleet might really be. That guess above by Hölzle was not something he just randomly pulled out – he did an estimate before he wrote it, as any engineer would. So a server of today – and in this sense, I don’t care if the server is used mostly for compute or mostly for storage, because the lines are blurred at most hyperscale companies – has something on the order of 420X to 560X the performance of one of those Pentium nodes. Multiple that by 1,680 machines in the 21 racks, and you get somewhere between 705,600 to 940,800 machines. That seems about right for the number of machines that support Google’s search infrastructure, if the company was well over 1 million machines for all of its workloads.

EnterpriseAI