Open-Weight Language Models and Retrieval-Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports: Assessment of Approaches and Parameters

Authors: Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese

Published: 2025-03-12

DOI: 10.1148/ryai.240551

Source: Full article

Abstract

The best-performing open-weight language models had high accuracy for automated extraction of structured clinical data points from unstructured radiology and pathology reports, with retrieval-augmented generation improving performance for complex reports.