Advanced Computing in the Age of AI | Tuesday, October 3, 2023

Univa Takes Over Control of Grid Engine from Oracle 

The debate over the future of the Grid Engine grid scheduler is over. Univa has acquired the source code, copyrights, and trademarks associated with the software from Oracle and is going to start supporting Oracle's customers effective immediately.

Grid Engine is one of the most widely deployed compute grid scheduling tools in the world, and it was a key asset in the portfolio of software products at Sun Microsystems. But ever since Oracle took over Sun three years ago, Grid Engine's future at the software giant has been uncertain.

Oracle had some very large accounts in both enterprises and traditional supercomputer sites using Grid Engine – mostly in the financial services industry – but the company was far more focused on back-end database and application software. Grid Engine got lost in the shuffle. Moreover, Oracle brought development of Grid Engine back inside, which compelled not one, but three different forks of Grid Engine.

The Open Grid Scheduler project was launched with the help of Oracle in December 2010, and the University of Liverpool jumped in with its Son of Grid Engine as well. Univa did its fork of Grid Engine in January 2011, and as Gary Tyreman, the company's CEO, explained at the time, he believed that Grid Engine was not a priority for Oracle and that it was being neglected. Univa, which was founded in 2004, had obtained an OEM license for Grid Engine from Sun and with the source code being available, Univa stepped up to the plate and started doing its own patches. This was possible because Univa had hired some of the key Oracle people associated with Grid Engine, including Fritz Ferstl, who was the founder of the Grid Engine project and also worked on the Codine and GRD schedulers before that.

The financial details of the deal between Oracle and Univa were not disclosed, but Tyreman said Oracle approached Univa, not the other way around. Oracle didn't say much at all about the deal, excepting this notice the company put up this morning:


Tyreman tells EnterpriseTech that Univa will offer technical support for both its own Univa Grid Engine and the Oracle Grid Engine variants.

"Oracle's primary interest has been the care and feeding of Grid Engine customers, and these are some of the biggest brands in the world," says Tyreman. "We have put a transition in place so Oracle customers can continue on the same binary. We have now added the original code to the original developers, and ideally that is what it is really all about."

This transaction did not include the transfer of any Oracle employees to Univa.

When Univa was founded, the idea was to bring together two different worlds: grid scheduling for supercomputer clusters and an emerging set of standards for gridding up Web services on compute and storage utilities (they were not called clouds yet) embodied by the Globus toolkit. The founder of the company included Steve Tueck and Ian Foster of Argonne National Laboratory and Carl Kesselman of the University of Southern California. The idea was to provide support for the Globus toolkit, but then Univa branched out into supporting and reselling Grid Engine with the deal it inked with Sun.

Figuring out how many Grid Engine customers there are in the world is a bit tricky, as it is with all open source projects, but Tyreman says that at least 10,000 organizations worldwide have downloaded and installed the Sun and Oracle versions of Grid Engine. In production environments, commercially supported Grid Engine is installed in hundreds of Univa sites as well as in hundreds of Oracle sites, and Tyreman says that Univa is on track to have "in the high hundreds" of paying customers by the end of this year.

Many of these customers are in traditional supercomputing labs in government and academic labs, of course, but others are more enterprise tech shops, doing seismic analysis, semiconductor design, and crash test simulation for instance. The largest Grid Engine customer in the world, as far as Tyreman knows, is in the oil and gas industry, and it has 150,000 cores in a server cluster doing the hunt for oil reserves.

Tyreman says there is an evolving new group of enterprise applications running atop Grid Engine that are not running traditional simulations, but more business process simulations. For instance, one airline carrier has loaded up an application that merges the pilot scheduling system, the aircraft maintenance schedule, and the flight scheduling system to optimize the scheduling of all three. These new applications start modestly, but they grow fast, and then companies start seeing other uses for the compute grids.

"These users are beginning to climb the curve, and they are beginning to scale and expanding their use cases," says Tyreman. "Everybody understands now that if you can fail earlier in a product development cycle, you can save the business many, many times the cost of the infrastructure to do the modeling."

So, for instance, in the pharmaceuticals industry, you can throw 2,000 or 3,000 cores to a researcher to do a lot of simulations over the course of two or three months and find the dead end earlier and not waste time or money. The other trend that Tyreman is seeing is that companies are buying fatter server nodes with more memory for their clusters to also speed up some simulations. Big memory is not just for databases, apparently.

All of these trends are driving business for Univa. Tyreman dug into the company's numbers back in May and discovered that 82 percent of its revenues came from enterprise customers, not traditional supercomputing labs. (It is not clear how the addition of Oracle customers will change this mix.) Through the first three quarters of 2013, Univa's revenues have more than doubled compared to the year-ago period, according to Tyreman, and growth will presumably accelerate more with the addition of those Oracle customers to the Univa fold.

Univa is working on its Grid Engine 8.1.7 release right now, which has a number of new features. The first will be support for Linux containers, or LXC for short. This has been one of the most sought after features among the Grid Engine base, and it is because the Linux scheduler still doesn't know how to throttle applications as well as it could. In certain applications, a job can come onto a system and try to hog all the resources, and with containers you can wrap slices of a server and allocate their resources and keep a job from hogging memory or I/O.

Univa is also cooking up native support for Windows Server, so you don't have to deploy the software onto the Services for Unix facility that is an add-on for Windows. This will come with Grid Engine 8.2 next spring, and this code is in alpha testing right now at one of Univa's largest customers.

By the way, Univa's Grid Engine software already supports the dispatching of work to Nvidia Tesla GPU coprocessors as well as Intel Xeon Phi X86 coprocessors, and if you like Calxeda ARM chips or Raspberry Pi ARM-based baby systems, you can run Grid Engine natively on this machinery, too.

A subscription to Univa's Grid Engine costs $99 per core per year for the commercial version, with volume pricing obviously applying.