YOLO26 ships NMS-free dual-head architecture across five scales with 40.9-57.5 mAP at 1.7-11.8 ms T4 TensorRT
YOLO26 unifies real-time vision under an NMS-free dual-head design that removes DFL and introduces MuSGD, Progressive Loss, and STAL. It advances the COCO accuracy-latency Pareto front while supporting five tasks in one pipeline. The shift reduces reliance on post-processing stages common in prior detectors.
Ultralytics released YOLO26 on arXiv 2606.03748 with coordinated changes to architecture and training. The model replaces standard post-processing with a native dual-head design, drops DFL to shrink the detection head, and applies MuSGD, Progressive Loss, and STAL assignment. These alterations produce consistent gains on detection, segmentation, pose, and oriented tasks while supporting open-vocabulary YOLOE-26 extension.
COCO benchmarks show the accuracy-latency frontier advancing over prior YOLO releases and competing real-time detectors. Latency measurements use TensorRT on T4 hardware; mAP spans n to x scales without task-specific retraining. LVIS minival reports 40.6 AP for YOLOE-26x under text prompting, confirming unified multi-task coverage.
The design eliminates fragmented pipelines that previously required separate NMS stages, heavy heads, and uneven small-object assignment. End-to-end operation simplifies deployment across hardware and reduces inference overhead. Task-specific heads integrate directly into the same backbone, collapsing separate model families into one training and export pipeline.
Next releases will likely extend the same dual-head pattern to video and 3D tasks, with code already public at the Ultralytics repository.
Glenn Jocher: YOLO26x will exceed 58.5 mAP on COCO val within four months of public weights release
Sources (3)
- [1]Primary Source(https://arxiv.org/abs/2606.03748)
- [2]Supporting Source(https://github.com/ultralytics/ultralytics)
- [3]Supporting Source(https://paperswithcode.com/dataset/coco)