Deep dive · Multimedia Tools and Applications · 2021

COVID-CXNet: Open Chest X-ray Dataset and Detection Model for COVID-19

One of the earliest large-scale open datasets and detection models for COVID-19 on frontal chest X-rays.

Medical Imaging
COVID-19
Open Dataset

COVID-CXNet pipeline: chest X-rays crowdsourced from GitHub, Twitter, SIRM and Radiopaedia pass through CLAHE and BEASF enhancement plates into the COVID-CXNet model (with a CheXNet/DenseNet-121 baseline) and out to a clinical decision-support workstation showing a saliency heatmap and structured findings.

Problem

In early 2020, COVID-19 chest X-ray data was scattered, small-scale, and inconsistent — making it nearly impossible to train robust detectors or compare methods fairly. Many published 'COVID-vs-normal' classifiers were quietly memorizing dataset artifacts rather than disease features.

Approach

We assembled what was, at the time, the largest publicly available COVID-19 frontal CXR collection by harmonizing multiple public sources, deduplicating cases, and standardizing preprocessing. On top of this dataset we trained COVID-CXNet, a CheXNet-based detector with explicit calibration and class-activation analysis to verify that predictions track radiologically meaningful regions rather than dataset shortcuts.

Key results

Open-sourced the largest curated frontal CXR COVID dataset of its time, used by dozens of follow-up studies.
Detection accuracy competitive with much larger proprietary models, with attention maps that overlap with radiologist regions of interest.
Highlighted concrete shortcut-learning risks (lateral markers, hospital-specific text overlays) that the community widely adopted as part of standard preprocessing.

Takeaways

Dataset curation is research: a careful open dataset can move a sub-field faster than a new architecture.
Class-activation maps are a cheap but powerful sanity check against shortcut learning in medical imaging.
Reproducibility and provenance matter more than peak accuracy when results inform clinical conversations.

Problem

Approach

Key results

Takeaways

Related publications