Advanced Computing in the Age of AI | Monday, July 22, 2024

ML Proponents Confront Reproducibility Claims 

Machine learning researchers are pushing back on the recent assertion that the AI framework is a key contributor to a reproducibility crisis in scientific research.

Rick Stevens, associate laboratory director for computing, environment and life sciences at Argonne National Laboratory, agreed there is a reproducibility problem facing the scientific community, particularly in the biological sciences. But the inability to reproduce scientific results has little to do with machine learning.

“Throwing machine learning under the bus is just not fair,” Stevens countered.

Stevens questioned assertions made by Genevara Allen, a statistician at Rice University, during the last month’s annual meeting for the American Association for the Advancement of Science. Allen warned that machine learning frameworks are often flawed because they are designed to come up with some kind of prediction, often failing to account for scientific uncertainties.

“A lot of these techniques are designed to always make a prediction,” said Allen, who is also affiliated with the Baylor College of Medicine. “They never come back with ‘I don’t know,’ or ‘I didn’t discover anything,’ because they aren’t made to.”

Stevens vigorously disputed the role of machine learning in the current scientific reproducibility crisis. If validated properly, he noted, machine learning results are “highly reproducible.”

But there are limits to the use in machine learning in scientific research, he acknowledged. “You can’t learn any better than your training data,” he said in an interview.

Stevens estimates that only about 20 percent of scientific results can currently be reproduced. The reasons include issues like “noisy data” and the inability to duplicate experiments.

As for Allen’s argument that machine learning models should be prevented from making predictions when they lack sufficient data, Stevens noted that ML models often do refrain from making predictions, or what experts label “abstention.” Indeed, researchers attempting to used machine learning models to help diagnose illnesses are incorporating abstention into their models.

For example, a team at Cornell University recently reported on a machine learning model for automating the diagnosis of liver disease that can also detect when a prediction is likely to be incorrect. “The proposed model abstains from generating the label of a test example if it is not confident about its prediction,” the researchers reported last November.

Hence, the widely acknowledged reproducibility crisis in key areas like biomedical research appears to be spawning new approaches that incorporate uncertainty into machine learning models. For example, the Cornell researchers said their model incorporates an abstention paradigm. That approach would enable an ML model to behave more like a doctor with insufficient information or face with an especially vexing case. The physician would then decide to run additional tests before arriving at a diagnosis.

Other research focuses on development of neural networks called “deep abstaining classifiers” that are trained to abstain from making predictions when confronted with confusing or hard-to-learn examples.

Stevens, whose lab is pushing hard on machine learning development, does not quibble with the suggestion that machine learning frameworks should be designed to account for uncertainty—often stemming from the lack of training data or poor quality.

“There is a fundamental limit,” he said. “Sometimes the network should just keep its mouth shut.”

Allen, the machine learning critic, did not respond to requests for comment.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).