Advanced Computing in the Age of AI | Thursday, September 28, 2023

The Rise of AI-Powered Machine Vision Has Implications for Enterprise Data Management 

AI-powered machine vision is becoming much more capable and widespread every day. New applications of machine vision and AI are being developed at a rapid rate, most notably in sectors such as healthcare, autonomous vehicles, manufacturing, agriculture and security.

In healthcare, machine vision is used to quickly analyze thousands of X-rays, CAT scans and other medical images. It is saving lives by prioritizing patient treatment at hospital emergency rooms. In the transportation industry AI-powered machine vision systems enable autonomous vehicles to spot obstacles and navigate roads safely.

Machine vision is also playing a key role in manufacturing through automatic defect detection, and the fast-expanding field of digital agriculture deploys computer vision systems to limit or even eliminate the use of pesticides while sustainably increasing production.

As useful as machine vision systems are, they are the source of vast amounts of unstructured data. Their increasing popularity is a significant factor driving the explosion in the quantity of data gathered globally, which is predicted to continue rising to 163 zettabytes by 2025, according to research by IDC.

With all these uses and all this data for AI-powered machine vision, it creates many data management implications for enterprises. Today, most organizations are facing conflicting data management needs.

Most of the data originates at the edge, but the compute and storage infrastructure are typically centralized at a few large data centers or on the public cloud. Moving the data to a centralized location brings significant delays and costs associated with transferring and storing the data.

A Need for Speed

According to Gartner, by 2025, about 75 percent of enterprise-generated data will be created and processed outside of a traditional data center or cloud. Most data captured at the edge is currently moved to a centralized location for processing, where it is used for AI model development.

This must be considered when implementing machine vision technologies. For any business that is capturing and centralizing petabytes of unstructured data – whether it is video, image or sensor data – the process of training machine learning algorithms is slowed significantly by these loads. This centralized data processing approach delays the AI development pipeline and production model tuning. In an industrial setting, this could lead to missed product defects, potentially costing significant sums to the business or even endangering lives.

To solve this problem, more businesses have started turning to distributed, decentralized architectures. This means most data is kept and processed at the edge to address the delay and latency challenges and tackle issues associated with data processing speeds. Deployment of edge analytics and federated machine learning technologies is bringing significant benefits while addressing a centralized system's inherent security and privacy deficiencies.

For example, a large-scale surveillance network that is constantly capturing video footage compiles large amounts of raw data for later analysis. To effective train an ML model from the footage means that it must be reviewed to differentiate between specific objects in the video. Only the footage in which something new is detected is needed, and not the tedious hours of non-changing video which might capture an empty building or street. By pre-analyzing the data at the edge and moving only the necessary footage to a centralized point, businesses can save time, bandwidth and costs.

While distributed architectures have many advantages, they also introduce additional complexity. Selecting and deploying the appropriate storage and compute infrastructure at the edge together with centralized management is critical, and significantly impacts the overall system efficiency and cost of ownership.

Tiered Storage

Many of the collected images and videos primarily used for AI model training should be permanently stored for different purposes. For example, in advanced driver assistance systems and autonomous vehicles, the AI makes decisions based on the data it collects in real time. However, if a problem emerges – possibly months or years later – businesses need to be able to go back and analyze what happened. Though critical for safety, this storage comes at a significant cost – an average of $3,351 per terabyte per year, according to Gartner. When you consider that the average autonomous test vehicle captures two terabytes of data per hour, it is easy to see how costs can mount up.

Many enterprises storing vast volumes of unstructured data commonly rely on network-attached storage appliances or public cloud storage. However, employing a tiered data storage architecture can present significant cost savings. In a tiered system, content is placed on fast storage during the active period when data is being processed and analyzed, while a backup copy is stored and archived on lower-cost storage – such as tape or object storage. Lower-cost storage can go as low as $50 per terabyte at scale. In many sectors – including autonomous vehicles – most data that is collected needs to be kept indefinitely, but it is rarely used and can be stored at the lowest cost tier.

New developments in unstructured data storage solutions and edge analytics are constantly hitting the market. To take advantage of these, enterprises should focus on implementing modular data management from start to finish to enable them to swap out elements for more advanced technologies when they are released.

Finding New Opportunities Using Machine Vision

Even with the best technologies and services in place, successfully transferring, processing and storing enormous quantities of data captured for machine vision use cases will continue to challenge enterprises across a wide range of verticals.

Stored data, however, also presents an emerging opportunity. For example, images and videos could be reused to develop new use cases. As such, stored data will become a new revenue stream for enterprises rather than a cost. Equally, when more advanced analytics technologies come onstream, many businesses could reuse existing archived data to develop their own new products. Some – especially car manufacturers – are already beginning to recognize this potential. These potential new revenue streams and data uses are excellent reasons to start prioritizing the smart and efficient processing and storage of data today.

About the Author

Plamen Minev is the technical director for AI and cloud at Quantum, where he works in the CTO's office. Plamen is a seasoned technology and engineering leader with more than 25 years of experience building technologies and leading teams at startups and global technology companies. He has proven expertise in large-scale cloud, telecom, and data analytics solutions. Prior to Quantum, Plamen was part of the engineering leadership teams at NYNJA, CPLANE.AI, Cisco Systems, and others, focusing on SaaS and Cloud technology. Plamen holds a master of science degree in computer science from the Technical University of Sofia.