New study details open-source vision-language model for biomedical applications

By:
Alan Flurry

The potential of applications using Artificial Intelligence is quickly venturing into the medical field, with implications for patients and practitioners.

A new study published in Nature Medicine presents an open-source multimodal vision-language foundation model, BiomedGPT, for various biomedical applications.

AI techniques have also demonstrated potential in solving a wide range of biomedical tasks, including radiology interpretation, clinical-information summarization and precise disease diagnostics. The specialized tasks of today's biomedical tools and models have led to growing interest in using AI for precision medicine and patient care that utilize analysis of highly diverse data sets – but hyper-specialized tasks in narrow disciplines and real-world settings challenge the capacity of traditional biomedical AI models.

Additionally, the hyper-specialization of AI in narrow disciplines often fails to provide the comprehensive insights necessary to assist doctors in real-world settings, where the flow of information can be slow and sporadic. A generalist biomedical AI such as BiomedGPT has the potential to overcome these limitations by using versatile models that can be applied to different tasks and are robust enough to handle the intricacies of medical data effectively.

"BiomedGPT mainly understands and interprets biomedical images such as MRI, CT, pathology, microscopy, and biomedical texts such as clinical notes, electronic health records, and literature publications," said Tianming Liu, Distinguished Research Professor of Computer Science in the University of Georgia School of Computing and co-author on the study. The BiomedGPT model can perform diverse biomedical tasks such as radiology report generation and visual question answering across modalities using natural language instructions." 

Traditional biomedical artificial intelligence (AI) models often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address the limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically closed source to researchers, practitioners and patients. 

"BiomedGPT is the first open-source and lightweight vision–language foundation model, designed as a generalist capable of performing various biomedical tasks," said Lichao Sun, assistant professor of Computer Science and Engineering at Lehigh University and co-author on the study.

The research team also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. "Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency," Sun said.

"This work demonstrates the great promise of multimodal foundation models (e.g., large vision-language models) in biomedical applications and underscores the challenges that must be addressed before these models can be effectively deployed in clinical settings," Liu said.

The study, "A generalist vision–language foundation model for diverse biomedical tasks," was published August 7 in Nature Medicine.

Image: Figure 1, from the study