OpenAI Jalapeño Inference Chip Ships Early Silicon with Broadcom
OpenAI's Jalapeño marks the first public custom inference ASIC from a frontier lab, developed with Broadcom to cut inference costs and Nvidia dependence. Early perf-per-watt gains echo TPU and Inferentia results. The move accelerates vertical integration across the full stack from silicon to product.
OpenAI announced Jalapeño on June 24 2026 after an October 2025 partnership filing. The ASIC targets inference only, leaving pre-training on Nvidia GPUs. Early silicon shows measurable efficiency gains on agentic workloads such as Codex, aligning with Brockman's stated focus on underserved latency-sensitive tasks.
Benchmark data remain limited to OpenAI statements. No third-party MLPerf numbers exist yet. Historical patterns from Google TPU v4 and Amazon Inferentia 2 indicate 1.4-2.1x perf/watt uplifts after two tape-outs when workloads are co-designed, suggesting Jalapeño may follow similar trajectory once volume production begins.
Vertical integration extends OpenAI's control from model weights through kernels, memory hierarchy, and scheduling. This mirrors the path taken by Google and Amazon, where custom silicon reduced external GPU spend by 20-30 percent within three years. Supply-chain exposure to Nvidia therefore declines, though Broadcom's 5 nm or 3 nm capacity becomes the new constraint.
Next steps include a second revision optimized for longer context windows and potential on-device variants. Deployment metrics will appear in OpenAI's 2027 infrastructure filings if the pattern of prior hyperscalers holds.
OpenAI: Jalapeño handles at least 15 percent of production inference traffic by December 2027.
Sources (3)
- [1]OpenAI Jalapeño Announcement(https://openai.com/index/jalapeno-inference-processor)
- [2]Broadcom Q2 2026 10-Q Custom ASIC Segment(https://sec.gov/Archives/edgar/data/0000791907/000079190726000023/brcm-20260630.htm)
- [3]MLPerf Inference v5.0 Results Archive(https://mlcommons.org/en/inference-datacenter-50/)