Advanced Computing in the Age of AI | Friday, June 9, 2023

TCP/IP Outdated for Big Data Transport, Quiet Company Says 


In network technology circles there’s a joke about “never underestimating the bandwidth capacity of a station wagon full of tapes driving down the highway.” In the same vein, at re:Invent last fall, Amazon Web Services announced with fanfare – and some humor – a service for transferring up to 100PB of data to AWS in a 45-foot long shipping container pulled by an 18-wheeler semitruck.

As data volumes grow, the basic internet communication protocol, TCP/IP – now approaching 45 years old –  is frequently overwhelmed when called upon for large-scale data transfers. There are work-arounds and data transfer boosters, but TCP/IP is something of an anachronism, a choke point in many advanced scale computing infrastructures.

Or, as Eric Hanselman, chief analyst at 451 Research, told EnterpriseTech: “Any time you have AWS rolling out a tractor trailer full of storage, you know we’ve got a data transit problem.”

Seth Noble, founder of little known Data Expedition, formed in 2000 and with a customer roster of 200 companies (Associated Press, Disney, Motorola, Lockheed Martin), has quietly toiled on the data transfer problem for nearly 25 years. This week the company announced CloudDat, data transport software based on the company’s Multipurpose Transaction Protocol (MTP/IP), which Data Expedition said moves data at up to 900mb/second per instance over commodity internet lines (compared with 100-150mb/second using TCP/IP). The company claims MTP/IP delivers an average 7X performance advantage over TCP/IP, though that number varies depending on customer deployment.

Data Expedition said CloudDat is now natively integrated into Oracle’s DIVA Cloud Service for managing digital media assets. In addition, CloudDat supports data transfer into and out of AWS, Microsoft Azure and Google Cloud Platform, along with on-prem cloud infrastructures.

“What they’ve done is put together an environment that’s attractive for users with very large volumes of data,” Hanselman said, “Those are folks for whom a performance improvement in transfer times makes a really big difference because of the size of the volumes of data they’re moving.”

Along with its MTP/IP protocol, Expedition’s product line have integrated instrumentation that accumulate intelligence about transmit paths and optimizes network performance.

“The protocol is useful in that it’s smarter about the way in which it manages the exchange of data,” Hanselman said. “But they’ve also put in place the necessary instrumentation within the whole protocol stack that dynamically understands the nature of the network in use and responds to it effectively. They’re actually looking at that transmit path as data is being transferred and adapting the best techniques for whatever prevailing conditions there are and to adapt them over time. Networks are very dynamic things.

"They’ve spent the time and invested the effort to be able to build something that’s flexible enough and capable enough to solve what is a significant need."

One customer is Amagi, a cloud-based managed broadcast services and targeted advertising company based in Bangalore with deployments in 40+ countries managing 80+ feeds. With 12TBs of download and 2TBs of upload across all its customers each month, the company four years ago looked for a new way to move massive amounts of content and settled on ExpeDat, Data Expedition’s flagship product.

“When we first used ExpeDat, it gave us 10X or more transfer speed,” said Srividhya Srinivasan, Amagi co-founder. “Our first reaction was, ‘Are we really seeing the right thing?’… Perceptually and visually, our customers do not know we are using ExpeDat. But they do know, if they are working with Amagi, their data transport is going to be quite fast and we will take care of transferring their content anywhere around the globe… We have never had a case where we had any limitation or issue with scaling or anything else.”

The key to the software, Noble told EnterpriseTech, is its ability to utilize the full potential of data paths, to move data faster by making the network itself more efficient. And because it’s a pure software solution, CloudDat can be deployed alongside traditional applications without changing or disrupting existing networks.

“The basic idea is that when you’re trying to transfer data over the internet or any packet switched networks, it’s a black box,” Noble said. “You throw a packet out there and you don’t know what happens until something comes back and talks to you. TCP/IP looked at the problem through the lens of 1974 networks and usage…. We’ve tried to focus on the way data moves around in modern networks.”

Noble does not provide extensive detail on the network optimization technology, he discusses it primarily in conceptual terms and highlights the promised benefits. This may be to keep the company’s secret sauce a secret – or because he realizes that no one other than a network specialist could understand how the technology works.

“It doesn’t matter if you’re transferring data over a kilobit per second network or a multi-GB per second network, whether it’s around the corner or across the world, it automatically figures out the right thing to do and transfers the data at the correct speed – not overflowing the network and not underutilizing the network,” said Noble. “It’s really important it’s able to do this without human intervention because of course most people don’t know and shouldn’t know the details of how their network connection works.”

Hanselman said Data Expedition is unique for the general-purpose nature of its technology, which can be implemented in a relatively straightforward way. He cited other high-volume data transport protocols, such as XTP, that he said can be difficult to integrate.

“The thing that these guys have done is really put the time and effort in,” he said. “The question for a lot of this is: how much do the application users, who by default normally use TCP or UDP and overlay their own capabilities on it, how much work do they want to do, how many dependencies do you then have at this intermediate layer with the new protocol, how much do you have to mess around with your customer’s environment to actually adapt them to it… For most typical applications it’s not worth the time and trouble to look into better ways of managing (data transit)."

Hanselman cited the typical example of transferring a complete machine image - “virtual instances that are big, multi-gigabyte images of virtual machines. Now it’s a reasonable thing to be able to push those up into the cloud environment, CloudDat is a gateway function that allows them to make the best optimization of the link between an on-premises data center and whatever the cloud environment is. If you don’t have a dedicated cloud connextion your ability to transfer data to the cloud is going to be catastrophic. When you move a full vitural machine up into the cloud, that can be significantly impacted by networking conditions."

Hanselman said there also are cloud storage gateway companies working to enhance data transport, but these efforts are usually integrated with their own products. “Data Expedition has a much more general purpose functionality, as opposed to dedicated capabilities.”

Another Data Expedition customer is Children’s Hospital of Philadelphia, which is generating high volumes of data in the course of its ongoing genomics research – databases that are processed, regenerated and transferred from one facility to another.

“They have terabytes of data at time that need to be distributed to other research institutions,” Noble said. “The end user may not even know we’re there under the hood, but it’s helping moving all this data between all those HPC centers.”