Language Models for Clinical Use

Background

The rapid adoption of Large Language Models (LLMs) in healthcare has opened new possibilities for automating complex tasks such as clinical summarization, medical coding, and biomedical question answering. However, despite their impressive performance on general NLP benchmarks, LLMs often struggle with the nuanced demands of biomedical domains—where structured data, domain-specific terminology, and reasoning over fragmented or implicit information are critical. This gap between general-purpose capabilities and domain-specific requirements motivates a deeper investigation into how LLMs can be adapted, evaluated, and enhanced for high-stakes clinical applications.

Goals

Our research aims to systematically improve the performance and reliability of LLMs in biomedical and clinical NLP tasks, including:

Designing architectures and tokenization strategies that better represent structured medical data (e.g., ICD codes).
Exploring the interplay between domain-specific pretraining, prompting strategies, and external knowledge integration.
Building models that generalize across sub-domains and remain interpretable and trustworthy in clinical settings.
Evaluating and enhancing reasoning capabilities of LLMs, particularly in multi-hop and knowledge-intensive scenarios.
Developing robust benchmarks and evaluation frameworks that reflect real-world complexity and avoid shortcut learning.