Why Hadoop Is The New Backbone Of American Express
American Express has to do a lot more than just offer credit cards and a network so merchants can accept them for payment for goods and services. While several decades ago the credit card usurped traveler's checks and other travel-related services as the main product that defined the company over a large portion of its 160-year history, more and more, American Express has to help consumers and merchants find each other to do the deals they both want to do.
This requires a more scalable and cost-effective back-end data processing infrastructure than American Express has been able to put into the field up until relatively recently. This additional set of services also required moving away from the traditional data warehouses that American Express had come to rely on over the years and to a new Hadoop stack that was a mere proof of concept three years ago and is now the back-end system for a slew of new services that are driving revenues for the company.
Sastry Durvasula, vice president and global head of information management and digital capabilities, and Kevin Murray, vice president of information management and integration, handle the software and hardware sides of the digital transformation at American Express, and they spoke about that transformation at the Strata + Hadoop World conference in New York this week. Large financial services firms are generally pretty secretive about what they are doing, for competitive reasons, but Durvasula and Murray gave enough information to show that not only was American Express taking the Hadoop platform seriously, but that it had utterly transformed its back end so that new services – many of which it could never have run before – are running on a new, modern platform.
American Express is one of the behemoths of finance. The company has 107.2 million cards in force, which accounted for $952.4 billion in worldwide bills for card holders (both consumers and corporations) last year. American Express is not just a credit card provider and a provider of merchant collection services, but it also owns the networks that process the transactions between buyers and sellers. The company operates in 130 countries and has $153 billion in assets under management across its various services; it has nearly 63,000 employees and brought in $33 billion in revenue and $5.4 billion in net income in 2013.
The important thing as far as transaction processing and the related analytics that American Express is using to drive more transactions through its system is that it has many millions of merchant sellers, many more millions of buyers, it processes billions of transactions, and those transactions represent trillions of combinations between buyers and sellers of products. Driving more transactions is key, because American Express gets a cut of the action on every transaction. If you do the math of its revenue divided by the billed business, its take is about 3.5 percent. If American Express wants to grow, it has to get people to use its card more frequently, add merchants to its pool, and get everyone to do more deals or spend more money or both.
To that end, explained Durvasula, American Express has been on the hunt for item-level transaction data – which he called the "holy grail of data" – outside of its own network. The partnership with Payback, a loyalty rewards program operated by Germany's retail giant Metro Group, gets American Express access to data on about two thirds of German households, and its Bluebird partnership with Wal*Mart to providing financial services usually provided by banks and aimed at the parts of the American population that do not have bank accounts. The company just announced a partnership with fast food restaurant chain McDonalds that will allow American Express card holders to automatically pay for food with reward points or credit, and is looking to offer similar services across its spectrum of merchants and card holders. To do so requires something a bit more sophisticated than a paper coupon, which was how merchants tried to entice card holders to spend money only a few years ago.
Now, American Express has created a new set of applications that are both social and mobile and back-ended by Hadoop and many of its extensions. Amex Offers is one such service that has replaced coupons with real-time offers. You log into the application, register your card or connect through social channels such as Facebook, Twitter, Foursquare, or TripAdvisor, and as you are roaming the world doing your thing, deals will pop up that are relevant to your interests and past buying habits and, presumably, coming from merchants that take the American Express card for payment. Such services are, the company hopes, going to help drive further adoption of American Express cards among merchants, creating a positive feedback loop that will expand its business. One such loop links actual transaction data and social media buzz for deals to the mapping functions in TripAdvisor, so consumers can get relevant recommendations for products and services when they are traveling. There is a similar service launched by American Express itself, called Vicinity, for corporate card holders, that provides a similar function.
On the merchant side, American Express has moved from quarterly reports to online business trend analysis and industry peer benchmarking based on anonymized data that is delivered continuously to help merchants figure out how they are doing compared to the competition.
"The goal is to drive relevance for consumers and value for merchants," explained Durvasula. "We can prove to the merchants the loyalty we are bringing them and we can prove to the card members the value we are bringing to them." To date, the Amex Offers program has saved card holders a combined $100 million since it was launched in May 2012.
In the past, many of the services that American Express offered to consumers and merchants had their own data warehouses based on relational databases behind them, explained Murray. But now, there is a single big data system called the Sync Platform internally at the company, and a recommender system that is integrated with it to make all of the deals on the fly that merchants want consumers to take advantage of.
The Sync Platform started out as a proof-of-concept Hadoop cluster with ten server nodes three years ago. Murray could not divulge the current size of the production cluster, which runs both real-time and batch applications. Ultimately, says Durvasula, the goal will be to have ten years' worth of data stored in this production system, which meets with Basel II and other banking regulations as well as providing a treasure trove of transactional, social sentiment, and other data. American Express has another Hadoop cluster for research and development that is separate from the Sync Platform production environment, where people learn how to use the tools, and a large analytical Hadoop cluster for model development and one-time analytical runs.
The current Sync Platform cluster is based on 2U servers with 24 disk bays that can now have 1 TB, 2 TB, or 4 TB disks when only a few years ago, said Murray, American Express was deploying 146 GB SAS drives. The server nodes usually have eight X86 cores or more per socket, for at least 16 cores per node, and each machine has two 10 Gb/sec Ethernet links coming out of it for cluster interconnect. The company puts 17 of these server nodes – it did not specify a brand or make – and has close to 300 cores and almost 1 PB of data per rack. That rack of servers costs about what a big, wonking NUMA server costs, according to Murray – and a NUMA machine that will presumably deliver less big data munching capability for the dollar otherwise American Express would be using big NUMA iron.
Over the past three years, the size of the cluster, the number of different tools on the cluster, and the number of people using the cluster have all grown.
"We will probably refresh it twice a year at the pace we are going," says Murray, referring to the production part of the cluster. "We will continue to advance our platform with more tools as they come to market and new and faster hardware as it comes to market." Some machines have had SSD flash storage added to them for doing real-time search and machine learning, and Murray says that flash drives have come down enough to justify the extra cost compared to SATA drives given the substantially increased speed they offer for I/O and throughput.
As machines are added to the Sync Platform cluster, they are wired up and run through a bunch of system and cluster tests, including the TeraSort and MinuteSort big data benchmarks as well as a suite of Java benchmarks based on applications developed by American Express. Last fall, when a 255-machine chunk was being added to the cluster, Murray and his team ran the TeraSort and MinuteSort tests on this chunk. Here is the result (forgive the blurry photo):
The previous record holder on the TeraSort test was a virtual cluster running the MapR Hadoop distribution on Google Compute Engine from October 2012, which had 1,003 servers with a total of 4,012 cores and one disk per node that could do the 1 TB TeraSort test in 54 seconds. Search engine and Hadoop creator Yahoo had the record before that, which was a 62 second run on the 1 TB test using a cluster with 1,460 nodes, 11,680 cores, and 5,840 disks. Anyway, the chunk of the Sync Platform at American Express with 255 nodes and 4,080 cores (that's 16 cores per node) and presumably with 24 drives per node (but maybe only 16, Murray did not say) was able to do the TeraSort test in 45 seconds. And on the MinuteSort, as you can see, at 1.65 TB of total data sorted, the American Express Hadoop cluster chunk was able to process more than twice as much data in a minute as the former record holder for that test. "Our business is delighted to know that we have one of the best performing Hadoop platforms in the industry," bragged Murray.
The machines tested run the production Hadoop environment at American Express, which uses a mix of open source and vendor products. Murray was not at liberty to be specific about what components outlined in the image above in the Hadoop stack were open source and which ones were commercial.
While the TeraSort and MinuteSort tests are interesting, what really matters is how the applications at American Express run. Murray said that American Express was one of the leaders in fraud detection and credit writeoffs, two important aspects of running a credit card business and bits of code that runs on the Sync Platform Hadoop cluster.
As it turns out, the coders at American Express created an item similarity collaborative filtering algorithms four years ago on its data warehouses before it turned to Hadoop. This collaborative filtering is run in batch mode to do the kind of recommendations that are part of the Amex Offers application today, derived from merchant and consumer transactions and other data sources.
"The runtimes were in multiple days, and we basically put it on the shelf," says Murray. "We can now run those algorithms in less than one hour on this platform. And with all of the other algos I have showed you, we have dropped every single one that we drop onto this platform from hours to minutes. So that, combined with the disparate data sources, has given us a technology capability that we have never had before."
That platform needs people to make it hum, of course. To that end, American Express has established its Big Data Labs to do research on various analytics topics. The company has more than 25 PhDs on its staff and has over 80 patents relating to high performance computing, dynamic pricing, graph technology, social connections, and web analytics and its researchers have published over 140 papers in Hadoop, machine learning, web databases, and real-time search.
All of this focus on Hadoop does not mean that American Express is pulling out all of the data warehouses. Data warehouses, says Murray, have rich schemas and good business intelligence tools, and he doesn't see them going away. However, that does not mean that companies should not be experimenting with the new technologies like the fast-evolving Hadoop stack.
"We need to test," Murray cautions. "It is a new platform, and we don't know if it is going to scale and we don't know if it is going to be available. So we have to have a mindset that some things are going to fail. We have shifted our investment philosophy and we are going to make some investments knowing that we are not going to get a big ROI out of them. But if we don't invest, we will never find the few that have a very high potential."