Advanced Computing in the Age of AI | Friday, March 29, 2024

A Fast Growing Company, an Expanding IT Department – a Recipe for Stress 

<img style="float: left;" src="http://media2.hpcwire.com/dmr/Psychology-Stress-Brain-Test-7_small.jpg" alt="" width="95" height="95" />Tata Steel Automotive Engineering, a case in point, combines the advantages of an open source workload management solution with commercial support.

When the Holmes and Rahe Stress Scale was issued in 1967, it was no surprise that negative events – death, disease, divorce – earned high marks. But some positive happenings, such as outstanding personal achievement, were also among the top stressors.

Fast-forward 45 years and apply the same idea to the IT staff in a small to medium sized manufacturing company. If the company has the good fortune to be successful and grow, top management may be overjoyed, but the IT organization will undoubtedly feel the stress.

Gary Tyreman, president and CEO of Univa, has seen the scenario unfold a number of times. Frequently one of the major IT pain points has to do with stretched computer resources, often involving workstations. It makes for a stressful paradox – IT is overburdened but, at the same time, the company's computer resources are being underutilized.

"Today's average workstation, which costs around $12,000 to $15,000, can have as many as eight or more cores," says Tyreman. "But only a fraction of that capacity is being used – the applications that most engineers are running on their workstations is not taking full advantage of the power available in all those cores. In a manufacturing company with a number of engineers running analysis or simulation jobs on their dedicated workstations, there is a lot of extra capacity on these machines that they can't do anything with."

Resource underutilization is stressful enough in and of itself, but when the company starts to move from being a small firm to a medium-sized one, the stress scale needle really starts to quiver. In this case, business is driving the need to continuously ramp of the organization's computational infrastructure. The upshot: IT is heading rapidly toward a cross over point.

"We have customers who are using as few as eight cores and others with as many as 100,000 cores in obviously different configurations," comments Tyreman. "The folks that are running the eight, 10 and 20 core environments are typically using very powerful workstations. But they are not able to share these resources and they don't have the in-house expertise to alleviate the problem."

He notes that there are collaborative solutions available – for example, ANSYS provides a workbench environment that allows engineers to connect to a local individual repository or a shared repository for CAE/simulation collaboration.

But at this point, the small manufacturer's IT group may have reached that cross over point where the underutilization of individual workstations can no longer be tolerated – it's time to either move into the cloud or build a shared environment centered around a cluster.

Tyreman says that if you are a small organization that doesn't need to use the cloud resources that often, your data sets are not too big, and you have all your licenses in place, then a cloud solution makes sense. But if you're moving into applications that require a lot of collaboration and some serious number crunching, you may need to make the transition from individual workstations to fewer and smaller workstations and bigger servers – perhaps one or two 12 core servers or a small cluster sitting in the closet to handle the increased workload.

This is when the IT stress scale needle takes another robust jump.

You may be introducing digital manufacturing into your operation to keep up with the demands of growth, new product development, and staying ahead of the always voracious competition. Perhaps you find it necessary to ramp up the company's finite element analysis capabilities or introduce computational fluid dynamics into your development process.

These licenses come with a hefty price tag. You don't want a half dozen engineers, each with their own engine, running separate simulations – you want the team to share this advanced (and expensive) software. And this means that you need a way to manage job scheduling and the computational resources required to support the company's success an on-going efforts to stay ahead of the competition.

Tata Steel and Success
A case in point is one of Univa's customers – Tata Steel Automotive Engineering (TSAE). Founded in India in 1907, TSAE is a top ten global steel maker. But even though it is part of the huge Tata Group Companies empire, it still operates as a SMM – a small to mid sized manufacturer.

About five years ago, TSAE's IT organization crossed over the line, significantly expanding its computing environment to meet growing demand for the company's products. In order to manage the added capacity, TSAE implemented the open source Sun Grid Engine (SGE) to match workload to its machines. The IT stress scale needle moved out of the red zone, only to bounce back with a vengeance when, two and a half year later, Sun was acquired by Oracle and SGE support was subsequently discontinued.

Rather than find a new proprietary solution to manage their IT workload, the company decided to stay with the existing tool, repurpose internal resources that had previously provided SGE support, and contract with domain experts – in this case, Univa –to provide critical tool support, integration and professional services.

Univa provided TSAE with a workload manager specific to its requirements. Univa also brought to the table a thorough knowledge of SGE, associated best practices, optimal operational configurations, and the experience to know what to customize and what to leave alone. The two companies have built up a mutually beneficial, long term relationship.

An in depth case study on the Univa, TSAE relationship states, "After purchasing Grid Engine support from Univa, TSAE observed their support relationship with Univa. Exposed to only good experiences, trust developed over time. Tata observed Univa's timely delivery of requests, with solutions that were well-executed and well-documented. They also observed that consultants from Univa were always available to participate in any special projects with which TSAE required assistance, growing to rely on the Univa support organization and their trustworthy history of reliability and excellence." (The full case study is available here. Filling out a brief registration form is required.)

Comments Tyreman, "The solution for TSAE was to make the (IT) system run on a more predictable, guaranteed basis – to have a reliable dial tone."

A simple enough remedy, but one that can go a long way toward alleviating the stresses that inevitably come with managing a fast-growing company's IT infrastructure and keeping that stress scale needle out of the red zone.

EnterpriseAI