AI token speed will jump from ~50 tokens/sec to ~1,000 tokens/sec (Cerebras)
Predictor: Emad Mostaque · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source
Prediction text
AI token speed will jump from ~50 tokens/sec to ~1,000 tokens/sec (Cerebras) | it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second
Verbatim quote
it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second
Predictor: Emad Mostaque
Calibration plot (stated vs observed)
Evidence about this node from Emad Mostaque is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class
This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.
Probability over time
Milestone chain
- 2025-10-01hitOpenAI-Cerebras 750 MW partnership announcedHow: OpenAI press release confirms multi-year deployment of 750 MW of Cerebras wafer-scale systemsSource: https://openai.com/index/cerebras-partnership/conf 97%
- 2026-02-01hitCerebras CS-3 delivers 1,800+ tokens/sec on Llama 3.3 70BHow: Independent benchmarks confirm Cerebras CS-3 at 1,800+ tokens/sec on Llama 3.3 70B (~10-20x GPU baseline of 50-200 tok/s)Source: https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstreamconf 95%Notes: HIT — exceeds the 1,000 tok/s prediction by ~80%.
- 2026-03-01hitGPT-OSS-120B running at 3,000 tokens/sec on CerebrasHow: OpenAI/Cerebras confirms gpt-oss-120B running at 3,000 tok/secSource: https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstreamconf 95%Notes: HIT — 3x the 1,000 tok/s prediction. 60x baseline GPT-5 throughput.
- 2026-03-01 → 2026-09-30pendingAWS-Cerebras inference cloud collaboration GAHow: AWS/Cerebras collaboration announced March 2026 reaches GA inference availability for enterprise customersSource: https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloudconf 80%
- 2026-07-15pendingQ1 window check-in (25%)
- 2026-06-01 → 2026-12-31pendingChatGPT user-facing 1,000+ tok/s mode rolloutHow: OpenAI rolls out user-facing inference mode (Codex, ChatGPT high-speed) at 1,000+ tok/s on Cerebras infrastructureSource: https://www.startuphub.ai/ai-news/ai-video/2026/openais-10-billion-cerebras-deal-signals-the-true-ai-battleground-is-inference-speed/conf 85%Notes: Mostaque's specific 'Codex 5.3 fast at 1,000 tok/s' claim — central to the prediction.
- 2026-09-29pendingQ2 window check-in (50%)
- 2026-12-14pendingQ3 window check-in (75%)
No downstream cascades — this prediction is a leaf in the dependency graph.
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | S_AGI_FAST_2027 AGI fast: drop-in remote worker by 2027-09 | 30.0% | 0.500 | 0.050 | -0.187 |
| killer | TK03 AI Regulatory Moratorium (EU/US Capability Freeze) | 10.0% | 0.050 | 0.500 | +0.083 |
| killer | TK01 AGI Capability Plateau (2026-27 Training Stall) | 15.0% | 0.050 | 0.500 | +0.061 |
| killer | TK14 Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates) | 20.0% | 0.050 | 0.500 | +0.038 |
Top outgoing (children)
Predictions THIS node influences
No outgoing edges.
Ticker exposure
Beneficiaries (23)
Adverse (6)
Prerequisites (4)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | S_AGI_FAST_2027 | AGI fast: drop-in remote worker by 2027-09 | agi_general_capability | — |
| killer | TK14 | Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates) | — | — |
| killer | TK01 | AGI Capability Plateau (2026-27 Training Stall) | — | — |
| killer | TK03 | AI Regulatory Moratorium (EU/US Capability Freeze) | — | — |
Dependents (0)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| No dependents | ||||
Linked documents (10)
Raw metadata
{
"nia": false,
"qty": "50 -> 1,000 tokens/sec (20x)",
"url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
"mode": "PREDICTION",
"role": "Host",
"context": "OpenAI also just did a deal with Cerebris. So when you're using it right now, it looks like when you're dealing with again a human on the other side, it's like 50 tokens a second or something... You're going from 50 tokens a second of this level of knowledge to 1,000.",
"verbatim": "it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second",
"conv_cues": "you're going from",
"direction": "UP",
"timeframe": "Imminent",
"conv_level": "HIGH",
"milestones": [
{
"kind": "llm_pre_event",
"label": "OpenAI-Cerebras 750 MW partnership announced",
"source": "https://openai.com/index/cerebras-partnership/",
"status": "hit",
"weight": 0.4,
"ordinal": -8,
"source_id": null,
"confidence": 0.97,
"source_url": "https://openai.com/index/cerebras-partnership/",
"expected_date": "2025-10-01",
"observed_date": "2025-10-01",
"research_origin": "deep_research",
"measurement_criterion": "OpenAI press release confirms multi-year deployment of 750 MW of Cerebras wafer-scale systems"
},
{
"kind": "llm_pre_event",
"label": "Cerebras CS-3 delivers 1,800+ tokens/sec on Llama 3.3 70B",
"notes": "HIT — exceeds the 1,000 tok/s prediction by ~80%.",
"source": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
"status": "hit",
"weight": 0.4,
"ordinal": -7,
"source_id": null,
"confidence": 0.95,
"source_url": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
"expected_date": "2026-02-01",
"observed_date": "2026-02-01",
"research_origin": "deep_research",
"measurement_criterion": "Independent benchmarks confirm Cerebras CS-3 at 1,800+ tokens/sec on Llama 3.3 70B (~10-20x GPU baseline of 50-200 tok/s)"
},
{
"kind": "llm_pre_event",
"label": "GPT-OSS-120B running at 3,000 tokens/sec on Cerebras",
"notes": "HIT — 3x the 1,000 tok/s prediction. 60x baseline GPT-5 throughput.",
"source": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
"status": "hit",
"weight": 0.4,
"ordinal": -6,
"source_id": null,
"confidence": 0.95,
"source_url": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
"expected_date": "2026-03-01",
"observed_date": "2026-03-01",
"research_origin": "deep_research",
"measurement_criterion": "OpenAI/Cerebras confirms gpt-oss-120B running at 3,000 tok/sec"
},
{
"kind": "llm_post_event",
"label": "AWS-Cerebras inference cloud collaboration GA",
"source": "https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloud",
"status": "pending",
"weight": 0.4,
"ordinal": -5,
"source_id": null,
"confidence": 0.8,
"source_url": "https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloud",
"expected_date": "2026-06-15",
"research_origin": "deep_research",
"expected_date_range": {
"to": "2026-09-30",
"from": "2026-03-01"
},
"measurement_criterion": "AWS/Cerebras collaboration announced March 2026 reaches GA inference availability for enterprise customers"
},
{
"kind": "quartile_checkpoint",
"label": "Q1 window check-in (25%)
... (truncated)