Call for Papers

The SAFER workshop tackles key challenges in medical foundation and vision-language models: hallucination detection in radiology and surgery, transparent and reliable reasoning for image-based diagnosis, and stable domain adaptation to unseen imaging environments. The workshop is organized around three core themes to address these challenges:

Theme 1: Stable and Efficient Adaptation of Medical VLMs This theme focuses on approaches that allow medical AI models to adapt to new domains while maintaining performance and faithful reasoning. This will also include human-in-the-loop methods where clinicians provide feedback to guide learning.

Topics of interest include, but are not limited to:

  • Test-time adaptation and continual learning for medical imaging
  • Efficient parameter/prompt-tuning for new clinical domains
  • Unsupervised, semi-supervised, and weakly-supervised adaptation
  • Robustness to domain shifts in imaging and other clinical data
  • Human-in-the-loop reinforcement learning for clinical feedback

Theme 2: Faithful Reasoning and Explainability in Medical VLMs This theme addresses methods for making AI reasoning transparent, interpretable, and aligned with clinical workflows. We highlight approaches that generate step-by-step explanations, such as chain-of-thought reasoning, that are visually grounded in medical images including X-rays, CT scans, MRIs, pathology slides, and surgical videos. By explicitly linking reasoning steps to image regions and clinical knowledge, these methods make AI decisions easier to inspect, understand, and trust. Clear reasoning traces can reveal which diagnoses were considered, why one was selected over others, and where uncertainty remains, supporting quality assurance, error analysis, and safer human–AI collaboration.

Topics of interest include, but are not limited to:

  • Chain-of-thought and step-by-step reasoning for medical decisions
  • Visual grounding of reasoning on medical images (e.g., CT, MRI, Pathology)
  • Causality-informed reasoning for image-based diagnosis and prognosis
  • Retrieval-augmented generation (RAG) for multimodal medical data
  • Generative models for synthetic medical data and report generation

Theme 3: Evaluation, Safety, and Trustworthiness This theme discusses how to measure and evaluate whether a model’s reasoning is faithful to the data, clinically grounded, and robust under domain shifts. We investigate metrics, benchmarks, and evaluation strategies that go beyond accuracy to assess reliability, uncertainty, and safety in real-world clinical settings.

Topics of interest include, but are not limited to:

  • Hallucination detection, mitigation, and quantification
  • Faithfulness metrics and benchmarks for multimodal models
  • Uncertainty quantification in LLMs and VLMs
  • Interpretability and explainability of model reasoning
  • Privacy-preservation during adaptation and reasoning

Overall, SAFER aims to advance the development of responsible, interpretable, and clinically trustworthy AI models that can reason and adapt reliably for safer healthcare, by bringing together researchers from medical imaging, machine learning, and medical AI to discuss and share recent work on faithful reasoning, robust adaptation, verification, and evaluation of medical AI systems.