This cohort study assessed a deep learning framework employing convolutional neural networks and object detection based on YOLOv8 for fracture classification and localization, with comparisons to three transfer-learning backbones: ResNet18, MobileNetV3-Small, and EfficientNet-B0. The population, sample size, setting, and follow-up were not reported. Primary outcomes included fracture classification and localization, with secondary measures such as AUROC, average precision, accuracy, and mAP.
Main results indicated that MobileNetV3-Small was the top-performing backbone for overall classification performance, but classification discrimination was generally low. For localization, YOLOv8 showed variability in detector variants, with the largest test-set mAPs at 0.5 and the largest variation across anatomical fracture types. The element of localization was better and more regular compared to classification, though specific effect sizes, absolute numbers, p-values, and confidence intervals were not reported.
Safety and tolerability data were not reported. Key limitations include generally low classification discrimination, variability in YOLOv8 detector performance, and largest variation across fracture types, with clinical translation requiring further external validation, prospective assessment, and expert comparison. Practice relevance was not reported, and the evidence is observational, avoiding causal claims. This framework shows promise for localization but remains experimental, necessitating rigorous validation before clinical application.
View Original Abstract ↓
Fracture detecting and localizing in radiographic images is essential to enhance the effectiveness of the diagnosis of trauma and allow the image to be interpreted. Despite the potential potential of the deep learning in musculoskeletal imaging, the quality of classification results and the stability of localization are significant issues.
The purpose of the work is to design and test a deep learning system that fractures radiographic images and localizes them with the use of convolutional neural networks and object detection on the basis of YOLOv8.
A retrospective secondary data analysis was done based on publicly available, de-identified radiographic data. In fracture classification, three transfer-learning backbones were analyzed: ResNet18, MobileNetV3-Small, and EfficientNet-B0, which were trained on repeated stratified cross-validation with early stopping. The evaluation of model performance was with area under receiver operating characteristic curve (AUROC), average precision (AP), Brier score, accuracy, precision, recall, and specificity and F1-score. The temperature scaling was used to perform probability calibration and the nested threshold optimization to compare the performance at various operating points. To localise fractures, Precision, recall, mAP, 0.5, and mAP, 0.5:0.95 were used to compare and train YOLOv8n, YOLOv8s and YOLOv8m detectors on both validation and test sets.
MobileNetV3-Small was the top-performing backbone in terms of overall performance, though the classification discrimination was generally low. Calibration analysis was used to show that probability distribution and reliability properties changed with the scaling of temperature and threshold optimization revealed significant differences in sensitivity, precision, specificity, and F1-score with different decision cutoffs. According to the localization experiment, YOLOv8 showed variability in the performance of the detector variants, with the largest test-set mAPs at 0.5 and the largest variation in classes across anatomical fracture types. These results show that the element of localization in the framework was better and more regular compared to the element of classification in the current experimental setup.
The presented framework offers a combined method of fracture classification, calibration of probability, threshold analysis and radiographic localization. Although the classification aspect demonstrated poor discriminative accuracy, the localization outcomes using the YOLOv8 were relatively better in this scenario, which justifies the usefulness of detector-based fracture localization in this context. Clinical translation will be subject to further external validation, prospective assessment, and comparison of experts and readers.