Advanced Computing in the Age of AI | Thursday, March 28, 2024

Deep Learning Drives Global Financial Institution ‘to Gain Every Little Cent’ 

via Shutterstock

Oh, to be a data scientist now that AI is here. Money, prestige, deference and little interference, since few understand much of what data scientists do. Still, it isn’t all beer and skittles, data scientists face the daily grind, too. We recently spoke (under condition of anonymity) with a data scientist at a North American financial institution, a resource-rich company implementing AI at enterprise scale, and his comments show the pressure data scientists are under to deliver on their rarefied skills, knowledge and pay.

“There’s a massive drive at all financial institutions, especially here, to drive efficiencies, for us to gain every little cent across the board,” he told us. “…It’s part of our internal KPIs (key performance indicators), to find implementable opportunities for efficiency gains in terms of how we perform. This is part of the master goal of the organization.”

Once an “implementable opportunity” has been identified, he and his team of 20 go about gathering data and preparing it for training deep neural networks to perform the task at hand. For that they use IBM servers and AI software. On the hardware side, they use Power9 AC922 servers, which combine Nvidia Tesla GPUs with IBM CPUs and high-bandwidth Nvidia NVLink interconnect.

“The machine itself is very powerful,” he said, “it’s faster than an x86 system because of the way the GPUs connect to the rest of the machine. That connection, the transfer of data between the GPUs and the CPU and the RAM memory, it’s much faster with the NVLink interconnect, so that speeds up the whole thing.”

They also use IBM’s Watson Machine Learning Accelerator, a combination of open source deep learning frameworks and development and management tools.

IBM Power9 AC922 server (source: IBM)

“It comes with all the nicely certified open source packages we’re used to using in machine learning and deep learning – Tensorflow, Pytorch , all that stuff, but it comes highly optimized,” he said. “And what’s best really is inside, the software layer … the whole framework, they do everything inside (Watson ML Accelerator), so even parallelizing or distributing the computation between different GPUs within that machine or a cluster of those machines, it becomes basically seamless for the developer.”

The integration and optimization, he said, eliminates much coding that otherwise would be required, coding prone to bugs and consuming “more time spent on IT tasks rather than data science tasks before you can accomplish what you need to do. So there’s that extra-added help from the Watson Accelerator framework that comes already with the Power systems.”

IBM touts Watson ML Accelerator’s support for models of greater complexity and data sets of greater size. With NVLink, entire model and datasets of up to almost 1 terabyte can be loaded into system memory and cached down across four GPUs within a single server, according to IBM.

Over the past several months, he and three of his staff have aimed deep learning at classification of operational risk incidents, which are routine, day-to-day bank activities that could result in future liabilities. They range from checks incorrectly cleared to a wrong order punched into a trading terminal – any mistake or problem that may lead to later losses in the form of law suits or regulatory fines. Each risk incident needs to be logged into a system and then correctly classified by category.

“This is important because the category that they’re classified under, that will affect the amount of capital we have to hold against potential losses in the future.” Called regulatory capital, it’s a cash set-aside as a provision to cover potential losses. Not done right, too little cash may be set aside leaving the institution vulnerable to higher-than-expected penalties – or, if too much cash is reserved for the risk, that takes money away that could otherwise be used to conduct the bank business.

“It’s a classification problem in the end, a typical machine learning problem,” he said. “The data is text, so there’s natural language processing of unstructured data, reading the text and trying, depending on the sentences in each incident description, to find the right category, which also is a typical sort of machine learning type of activity that you see a lot of.”

High-powered GPU compute is required to train such a system based on historical risk incident data accumulated over decades at the institution, resulting in “thousands and thousands of data points,” he said. Using Tensorflow he and his team built a system whose priority isn’t speed but accuracy. After all, risk incidents don’t happen in high volume, it’s getting the classification right that counts most, more so than how fast they’re classified.

Along with AI, the team also tried less powerful “commonly used ad hoc approaches involving word counts and word combinations,” he said, to test against the results of the deep learning system. But the ad hoc approaches delivered only 60 to 70 percent accuracy, which he dismissed as “closer to random,” while the deep learning functionality delivers 90 percent accuracy. “And that’s only using sample training data, test data, so that’s a massive gain.”

The system is coming out of prototype and soon will go through final testing, followed by internal presentations for management approval before going into production.

He said risk incident classification system shares characteristics with other ML projects the organization is working on, such as rating of bonds not yet assessed by the rating agencies. Basically loans in the form of securities, bonds are rated from AAA to junk based on default risk. “Some loans aren’t rated, so … we end up being a bit clueless to what level of risk should be assigned, which also impacts regulatory capital, because every time you take a risk we have to assign regulatory cap to cover potential losses.”

As with risk incident assessment, massive amounts of data from multiple sources is aggregated and processed. “So this functionality would be also classifying, or rating, the bonds not yet rated by rating agencies, using the data we have and deep learning architectures – that’s another one in the making.”

The data science group is also working on detection of fraud: “That’s a big data play, like a terabyte of data per quarter that comes out, basically finding financial crimes, money laundering, and so forth, in the billions of transactions from client accounts, that’s another important project as well.”

EnterpriseAI