Automated chest X-ray pneumothorax analysis pipeline with multi-model integration for detection, segmentation, and report generation.
Learn MorePneumothorax is a life-threatening condition often missed on chest X-rays due to subtle visual features. We developed an automated system that integrates YOLO detection, QMedSAM segmentation, and LLaVA-Med report generation to provide rapid, accurate diagnosis with visual and textual outputs.
Traditional diagnosis relies on expert radiologists and is time-consuming. Subtle pneumothorax features can be easily overlooked, leading to delayed or missed diagnoses.
A modular pipeline combining specialized AI models for detection, segmentation, and natural language report generation to assist clinicians in busy settings.
Improved diagnostic speed and accuracy with 91.2% precision, helping reduce radiologist workload and minimize misdiagnosis risk in clinical practice.
Our system uses a three-stage modular pipeline that combines computer vision and natural language processing:
Figure 1. End-to-end pipeline integrating detection, segmentation, and report generation.
YOLOv11m
Identifies pneumothorax regions and generates bounding boxes
QMedSAM
Creates precise lesion masks from detected regions
LLaVA-Med
Generates comprehensive medical reports with visual context
Fine-tuned YOLOv11m on 12,047 chest X-rays for real-time pneumothorax localization. Achieves mAP@0.5 of 0.399 with robust performance on subtle lesions.
Quantized medical segmentation model trained with knowledge distillation. 75% smaller than MedSAM while maintaining 49% IoU accuracy.
Medical-adapted vision-language model generating clinically relevant reports. Integrates mask coordinates for enhanced spatial awareness.
Figure 2. Chat Interface with Bounding Box, Segmentation Mask, and Generated Report.
Comprehensive evaluation on the SIIM-ACR Pneumothorax Dataset demonstrates strong performance across detection, segmentation, and report quality:
| Metric | MedSAM (Baseline) | QMedSAM (Our Model) | Improvement |
|---|---|---|---|
| Mean IoU | 0.522 | 0.491 | -6% (acceptable trade-off) |
| Inference Time | 7.19 seconds | 4.28 seconds | 40% faster ⚡ |
| Model Size | 357.67 MB | 90.24 MB | 75% smaller 🎯 |
| Peak Memory | 765.02 MB | 758.54 MB | Similar efficiency |
All models were trained and evaluated on the SIIM-ACR Pneumothorax Dataset with rigorous preprocessing:
SIIM-ACR Pneumothorax Dataset
12,047 expert-annotated chest X-rays
Training: 3,416 images
Validation: 854 images
Testing: 2,410 images
Balanced positive/negative ratio (1:1) to reduce prediction bias
• DICOM to 8-bit grayscale PNG conversion
• Image resizing for model compatibility (256×256 and 1024×1024)
• Binary mask extraction for segmentation ground truth
• Class balancing through negative sample downsampling
Hardware: 2× NVIDIA RTX 2080 Ti GPUs
Software: Python 3.10, PyTorch 2.5, CUDA 10.1
Interface: Flask-based web system
Training: 150 epochs (YOLO), 30 epochs (QMedSAM)
Building on our current framework, we envision several promising research directions:
Co-optimize all components with segmentation-aware embeddings for better visual-linguistic alignment
Validate on ChestX-ray14 and MIMIC-CXR datasets to test cross-dataset generalization
Integrate probabilistic inference for confidence scores in visual and textual outputs
Enable conversational queries like "What is the pneumothorax size?" or "Suggest treatment options"
Upgrade to 1024×1024 segmentation to eliminate information loss and improve boundary precision
Extend the pipeline to CT, MRI, and other imaging modalities for broader clinical applications
This project demonstrates that combining specialized computer vision models with medical-adapted language models creates an effective, explainable AI system for pneumothorax diagnosis. Our multi-model integration approach successfully bridges the gap between visual precision and textual reasoning, offering a practical solution for clinical deployment.
The system achieves 91.2% precision and 60.3% recall while maintaining efficient processing times (~1 minute per image), making it suitable for real-time clinical assistance and telemedicine applications. This work represents a promising direction for explainable, efficient AI-assisted medical imaging diagnosis with adaptability to other modalities and diseases.