A research team led by Columbia University has developed an open-source framework designed to streamline and accelerate artificial intelligence research using health data, addressing longstanding challenges in data standardization, reproducibility, and collaboration across institutions.

The framework, called MEDS, introduces both a standardized data format and a growing ecosystem of interoperable tools intended to support the development and evaluation of machine learning models using clinical data. A study describing the framework was published in NEJM AI.

The researchers say the framework could help reduce technical barriers that currently slow health AI research and make it difficult for scientists to reproduce findings or compare models across studies and institutions.

"MEDS is a simple way to make all different sources of electronic health record (EHR) data look the same to your code, regardless of what hospital or clinic or EHR software system the data came from," says Matthew McDermott, Ph.D., assistant professor of biomedical informatics at Columbia University and study leader.

"MEDS lets us share code that we can use to train models on many different sites of care without needing to share sensitive patient data—and often without needing to even do the more challenging step of fully 'harmonizing' the data into a consistent clinical vocabulary. This infrastructure will allow researchers to spend less time rebuilding pipelines and more time answering clinically meaningful questions."