OpenAI Broadcom Jalapeno ASIC Completes Tape-Out for Production Inference
OpenAI's Jalapeno ASIC with Broadcom signals the first large-scale move by a frontier lab into custom inference silicon. The design addresses cost and power constraints that GPU scaling alone cannot solve. Long-term supply chain and export-control dynamics shift as a result.
OpenAI announced the Jalapeno custom ASIC developed with Broadcom for LLM inference. The chip integrates Broadcom's networking IP with OpenAI's model-specific optimizations for attention and KV cache management. Production volumes are slated for Broadcom's 2026 roadmap on TSMC 3nm.
NVIDIA retains 85%+ share of AI accelerators per 2024 earnings. Prior custom efforts such as Google TPU v5 and AWS Trainium2 delivered 30-50% cost reductions only after 18-month software maturation cycles. Jalapeno's disclosed architecture omits training capability, focusing exclusively on serving and aligning with OpenAI's reported 2025 inference spend exceeding $3B.
Geopolitical exposure rises as OpenAI shifts from NVIDIA GPUs toward Broadcom ASICs manufactured in Taiwan. This reduces direct reliance on U.S. export-controlled GPUs but increases dependence on TSMC capacity allocation already contested by Apple and AMD. Supply contracts signed in 2025 will determine whether Jalapeno reaches 15% of OpenAI inference traffic by late 2027.
Operational impact centers on cluster TCO. If Jalapeno sustains 3.8x tokens-per-watt versus H100 baselines in internal tests, OpenAI can defer 2026 GPU purchases by an estimated $1.2B while maintaining latency SLOs for GPT-class models.
OpenAI: Jalapeno reaches 25% of production inference traffic by December 2027.
Sources (3)
- [1]Primary Source(https://openai.com/index/openai-broadcom-jalapeno-inference-chip/)
- [2]Supporting Source(https://www.broadcom.com/products/custom-asic)
- [3]Supporting Source(https://newsroom.tsmc.com/news/2025/3nm-capacity-allocation)