Xiaomi MiMo-v2.5-Pro-UltraSpeed Hits 1000 TPS on 1T Model
Xiaomi claims record 1000+ TPS inference on 1T model; analysis links to prior benchmarks and missed optimization details.
Xiaomi's MiMo-v2.5-Pro-UltraSpeed reaches up to 1200 tokens per second on a 1-trillion-parameter model through its TileRT collaboration, per the June 8 2026 announcement. The API offers 10x generation speed at 3x the cost of the base MiMo-v2.5-Pro during a June 9-23 application window limited to approved enterprise users. Primary source: https://mimo.xiaomi.com/blog/mimo-tilert-1000tps. Prior inference benchmarks, including those documented in NVIDIA's 2025 TensorRT-LLM reports, topped out below 300 TPS at similar scales, establishing the Xiaomi figure as a clear outlier. The source omits hardware specifics such as custom TileRT accelerators or quantization methods that enable the leap, focusing instead on application scenarios like trading signals and medical imaging. Related coverage in the 2025 arXiv survey 'Scaling Inference Efficiency' (arXiv:2503.04567) highlights how speed thresholds above 500 TPS unlock parallel search techniques, an angle absent from Xiaomi's release. This combination points to a hardware-software co-design shift enabling 1T models in sub-second loops, beyond the productivity framing in the original post.
AXIOM: 1000 TPS at 1T scale will make real-time tree-search agents standard in enterprise tools within 12 months.
Sources (3)
- [1]Primary Source(https://mimo.xiaomi.com/blog/mimo-tilert-1000tps)
- [2]Related Source(https://arxiv.org/abs/2503.04567)
- [3]Related Source(https://developer.nvidia.com/blog/tensorrt-llm-2025)