Advanced Computing in the Age of AI | Sunday, May 19, 2024

AIOps: Slaying the Dragon of IT Complexity 

Source: shutterstock

There’s a growing threat in the IT world – the widening gap between IT complexity/change and the human ability to manage it. The digital transformation of the enterprise data center – with its integration of on-premise/multi-cloud environments, big data, containers and microservices, and increasingly high SLA requirements – has created an elaborate virtualized infrastructure that can generate millions of transactions a day and change in just seconds.

IT staff can’t fully comprehend or manage the sheer scale and dynamic nature of these distributed hybrid environments, making it harder to achieve the appropriate IT service levels needed for business success. In fact, one 2018 report showed 76 percent of CIOs surveyed believe escalating IT complexity has made it impossible to adequately manage multi-cloud ecosystems.

Not that organizations haven’t tried to cope with the complexity and get better performance from their systems. Most firms use monitoring tools, often sourced from multiple vendors, to manage system activities, though typically in an independent fashion that doesn’t provide enough information. These siloed standalone tools monitor separate system modules, such as networks, applications, or storage, but don’t provide a consolidated real-time view of the entire interconnected infrastructure. This incomplete visibility leaves IT staff struggling to interpret and react to the hundreds of system alerts, ongoing system changes and interdependencies of individual system resources spread across the globe. System performance is not optimized and issues take too long to resolve, resulting in lengthy MTTR/MTTI rates and lots of finger pointing.

What’s needed is new disruptive technology that can unify and modernize IT operations. One approach already adopted by many organizations is to implement AI solutions to interpret the mountains of data and system alerts generated by today’s IT ecosystems. One solution is Artificial Intelligence to IT Operations (AIOps), software applications that integrate with other applications and toolsets to enhance IT operational tasks, such as performance monitoring, event correlation and data analysis. Fueled by machine learning, AIOps applications can ingest and then analyze an ever-increasing volume and variety of data types across the IT ecosystem, improving the diagnostic and troubleshooting capabilities of IT teams.

A key AIOps feature is auto-discovery, which automatically collects and correlates data on every IT asset across the domain and records changes made to them. These assets include all the physical, virtual and logical system resources within the network. By automating the IT asset discovery and correlation processes, data about the assets is constantly refreshed in the system Configuration Management Data Base (CMDB), maintaining an inventory of system resources – a single source of truth updated in real time.

Managing Data and Distributed IT Infrastructures

It’s hard to overestimate the value of accounting for hundreds or even thousands of disparate IT assets spread across a siloed distributed environment, and how those hundreds of moving parts impact each other. A lack of visibility into IT assets can increase system vulnerabilities and security risks, as well as IT complexity, while squandering IT budgets and personnel. But auto discovery creates a virtual map of your entire networked topology, including the hybrid IT infrastructure, applications and business transactions, letting you see where assets are located across the IT stack and how they are operating. This relieves IT staff from time-consuming, manual data analysis and streamlines troubleshooting activities.

AIOps can also help with system planning projects, including more accurate budgeting for new IT assets, better capacity planning and deriving more value from current system assets, improving ROI and system utilization.

Predicting Problems and Impact of System Changes

 The “secret sauce” of AIOps is diagnostics of system problems that can be fixed proactively. Root cause analysis help IT teams rapidly drill down into problems, uncover anomalies and resolve issues quickly. AIOps can also simplify configuration management and technology migrations/upgrades, helping to forecast the potential impact of system changes and avoiding disruption of business-critical applications.

The dynamic nature of modern IT architectures, supporting fluctuating demand and millions of transactions, has decreased in visibility as enterprises have deployed virtualized solutions. But new tools, such as AIOps, can help improve insight into IT infrastructures and help IT staff pinpoint potential issues so they can deliver the performance needed by today’s digital enterprises.

Sameer Padhye is founder and CEO of FixStream.