Chest X-Ray Pneumothorax Detection and Report Generation System Based on Multi-Model Integration

Automated chest X-ray pneumothorax analysis pipeline with multi-model integration for detection, segmentation, and report generation.

林孟霆 • 陳冠宇 • 謝嘉銘
National Tsing Hua University

Learn More

About the Project

Pneumothorax is a life-threatening condition often missed on chest X-rays due to subtle visual features. We developed an automated system that integrates YOLO detection, QMedSAM segmentation, and LLaVA-Med report generation to provide rapid, accurate diagnosis with visual and textual outputs.

🎯 The Challenge

Traditional diagnosis relies on expert radiologists and is time-consuming. Subtle pneumothorax features can be easily overlooked, leading to delayed or missed diagnoses.

💡 Our Solution

A modular pipeline combining specialized AI models for detection, segmentation, and natural language report generation to assist clinicians in busy settings.

🚀 Impact

Improved diagnostic speed and accuracy with 91.2% precision, helping reduce radiologist workload and minimize misdiagnosis risk in clinical practice.

System Architecture

Our system uses a three-stage modular pipeline that combines computer vision and natural language processing:

System Pipeline Diagram

Figure 1. End-to-end pipeline integrating detection, segmentation, and report generation.

1

Detection

YOLOv11m

Identifies pneumothorax regions and generates bounding boxes

2

Segmentation

QMedSAM

Creates precise lesion masks from detected regions

3

Report Generation

LLaVA-Med

Generates comprehensive medical reports with visual context

YOLO Detection

Fine-tuned YOLOv11m on 12,047 chest X-rays for real-time pneumothorax localization. Achieves mAP@0.5 of 0.399 with robust performance on subtle lesions.

QMedSAM Segmentation

Quantized medical segmentation model trained with knowledge distillation. 75% smaller than MedSAM while maintaining 49% IoU accuracy.

LLaVA-Med Reporting

Medical-adapted vision-language model generating clinically relevant reports. Integrates mask coordinates for enhanced spatial awareness.

Result

System Pipeline Diagram

Figure 2. Chat Interface with Bounding Box, Segmentation Mask, and Generated Report.

Performance Metrics

Comprehensive evaluation on the SIIM-ACR Pneumothorax Dataset demonstrates strong performance across detection, segmentation, and report quality:

91.2%
Precision
60.3%
Recall
0.726
F1-Score
0.491
Mean IoU

Model Comparison: MedSAM vs QMedSAM

Metric MedSAM (Baseline) QMedSAM (Our Model) Improvement
Mean IoU 0.522 0.491 -6% (acceptable trade-off)
Inference Time 7.19 seconds 4.28 seconds 40% faster ⚡
Model Size 357.67 MB 90.24 MB 75% smaller 🎯
Peak Memory 765.02 MB 758.54 MB Similar efficiency

Key Advantages

Dataset & Training

All models were trained and evaluated on the SIIM-ACR Pneumothorax Dataset with rigorous preprocessing:

📊 Dataset Details

SIIM-ACR Pneumothorax Dataset

12,047 expert-annotated chest X-rays

Training: 3,416 images
Validation: 854 images
Testing: 2,410 images

Balanced positive/negative ratio (1:1) to reduce prediction bias

🔧 Preprocessing

• DICOM to 8-bit grayscale PNG conversion

• Image resizing for model compatibility (256×256 and 1024×1024)

• Binary mask extraction for segmentation ground truth

• Class balancing through negative sample downsampling

💻 Infrastructure

Hardware: 2× NVIDIA RTX 2080 Ti GPUs

Software: Python 3.10, PyTorch 2.5, CUDA 10.1

Interface: Flask-based web system

Training: 150 epochs (YOLO), 30 epochs (QMedSAM)

Future Directions

Building on our current framework, we envision several promising research directions:

🔮 Joint Fine-Tuning

Co-optimize all components with segmentation-aware embeddings for better visual-linguistic alignment

📈 Dataset Expansion

Validate on ChestX-ray14 and MIMIC-CXR datasets to test cross-dataset generalization

🎲 Uncertainty Estimation

Integrate probabilistic inference for confidence scores in visual and textual outputs

💬 Interactive Assistant

Enable conversational queries like "What is the pneumothorax size?" or "Suggest treatment options"

🔬 Higher Resolution

Upgrade to 1024×1024 segmentation to eliminate information loss and improve boundary precision

🌐 Multi-Modality

Extend the pipeline to CT, MRI, and other imaging modalities for broader clinical applications

Conclusion

This project demonstrates that combining specialized computer vision models with medical-adapted language models creates an effective, explainable AI system for pneumothorax diagnosis. Our multi-model integration approach successfully bridges the gap between visual precision and textual reasoning, offering a practical solution for clinical deployment.

The system achieves 91.2% precision and 60.3% recall while maintaining efficient processing times (~1 minute per image), making it suitable for real-time clinical assistance and telemedicine applications. This work represents a promising direction for explainable, efficient AI-assisted medical imaging diagnosis with adaptability to other modalities and diseases.