238_024predictionAIAI-timing

AI token speed will jump from ~50 tokens/sec to ~1,000 tokens/sec (Cerebras)

Predictor: Emad Mostaque · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source

Prior probability

50.0%

Current probability

37.2%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-04-30 – 2027-09-30

Edges in / out

4 / 0

Tickers exposed

Prediction text

AI token speed will jump from ~50 tokens/sec to ~1,000 tokens/sec (Cerebras) | it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second

Verbatim quote

From episode "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238"

it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second

Predictor: Emad Mostaque

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.722

Brier

0.0073

excellent

Hits / Misses

3 / 0

of 4 resolved

Hit rate

75.0%

Calibration plot (stated vs observed)

Evidence about this node from Emad Mostaque is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 37.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 3 fired ✓ · 5 pending

2025-10-01hitOpenAI-Cerebras 750 MW partnership announced
How: OpenAI press release confirms multi-year deployment of 750 MW of Cerebras wafer-scale systems
Source: https://openai.com/index/cerebras-partnership/conf 97%
2026-02-01hitCerebras CS-3 delivers 1,800+ tokens/sec on Llama 3.3 70B
How: Independent benchmarks confirm Cerebras CS-3 at 1,800+ tokens/sec on Llama 3.3 70B (~10-20x GPU baseline of 50-200 tok/s)
Source: https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstreamconf 95%
Notes: HIT — exceeds the 1,000 tok/s prediction by ~80%.
2026-03-01hitGPT-OSS-120B running at 3,000 tokens/sec on Cerebras
How: OpenAI/Cerebras confirms gpt-oss-120B running at 3,000 tok/sec
Source: https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstreamconf 95%
Notes: HIT — 3x the 1,000 tok/s prediction. 60x baseline GPT-5 throughput.
2026-03-01 → 2026-09-30pendingAWS-Cerebras inference cloud collaboration GA
How: AWS/Cerebras collaboration announced March 2026 reaches GA inference availability for enterprise customers
Source: https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloudconf 80%
2026-07-15pendingQ1 window check-in (25%)
2026-06-01 → 2026-12-31pendingChatGPT user-facing 1,000+ tok/s mode rollout
How: OpenAI rolls out user-facing inference mode (Codex, ChatGPT high-speed) at 1,000+ tok/s on Cerebras infrastructure
Source: https://www.startuphub.ai/ai-news/ai-video/2026/openais-10-billion-cerebras-deal-signals-the-true-ai-battleground-is-inference-speed/conf 85%
Notes: Mostaque's specific 'Codex 5.3 fast at 1,000 tok/s' claim — central to the prediction.
2026-09-29pendingQ2 window check-in (50%)
2026-12-14pendingQ3 window check-in (75%)
2027-03-01pendingAI token speed will jump from ~50 tokens/sec to ~1,000 tokens/sec (Cerebras)

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 37%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-17T02:00:01Z37.2%-1.1pp

Network propagation: 38.3% → 37.2%

5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96

LBP2026-05-10T02:00:02Z38.3%-2.2pp

Network propagation: 40.5% → 38.3%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z40.5%-4.5pp

Network propagation: 45.0% → 40.5%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z45.0%-1.7pp

Network propagation: 46.7% → 45.0%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z46.7%-3.3pp

Network propagation: 50.0% → 46.7%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	S_AGI_FAST_2027 AGI fast: drop-in remote worker by 2027-09	30.0%	0.500	0.050	-0.187
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.500	+0.083
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.500	+0.061
killer	TK14 Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates)	20.0%	0.050	0.500	+0.038

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

33 ticker(s) linked

Beneficiaries (23)

SOUN CRWV SITM NVDA ARM GTLB BBAI TSM APLD CEVA AI MSFT MRVL SFTBY ORCL QCOM AVGO BABA AMD GOOGL IBM AMZN META

Adverse (6)

WNS CHGG CTSH IBM INFY ACN

Prerequisites (4)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	S_AGI_FAST_2027	AGI fast: drop-in remote worker by 2027-09	agi_general_capability	—
killer	TK14	Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.628	arxiv	DeepTokenEEG Enhancing Mild Cognitive Impairment and Alzheimers Classification via Tokenized EEG Features	—	mentions	pending	2026-05-14
0.628	arxiv	Gated Subspace Inference for Transformer Acceleration	—	mentions	pending	2026-05-04
0.628	arxiv	Cascade Token Selection for Transformer Attention Acceleration	—	mentions	pending	2026-05-04
0.614	manifold	How much additional mana will be printed by the 2x trader bonuses in 7 days? (total trader bonuses divided by 2)	—	mentions	pending	2026-05-15
0.604	arxiv	A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language	—	mentions	pending	2026-05-05
0.592	github_release	facebookresearch/projectaria_tools 1.5.0	—	mentions	pending	2024-03-21
0.592	arxiv	A Paradigm for Interpreting Metrics and Identifying Critical Errors in Automatic Speech Recognition	—	mentions	pending	2026-05-05
0.579	arxiv	SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation	—	mentions	pending	2026-06-01
0.573	polymarket	Counter-Strike: HOTU vs INOX Division (BO3) - CCT Europe Series 1 Playoffs	100%	mentions	pending	2026-05-12
0.570	github_release	facebookresearch/faiss v1.5.3	—	mentions	pending	2019-06-24

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "50 -> 1,000 tokens/sec (20x)",
  "url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "OpenAI also just did a deal with Cerebris. So when you're using it right now, it looks like when you're dealing with again a human on the other side, it's like 50 tokens a second or something... You're going from 50 tokens a second of this level of knowledge to 1,000.",
  "verbatim": "it's like 50 tokens a second or something like when we use GPT 5.4 Pro extended... You're going from 50 tokens a second of this level of knowledge to 1,000. So in codeex now if you use 5.3 fast it's a thousand tokens a second",
  "conv_cues": "you're going from",
  "direction": "UP",
  "timeframe": "Imminent",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "OpenAI-Cerebras 750 MW partnership announced",
      "source": "https://openai.com/index/cerebras-partnership/",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.97,
      "source_url": "https://openai.com/index/cerebras-partnership/",
      "expected_date": "2025-10-01",
      "observed_date": "2025-10-01",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI press release confirms multi-year deployment of 750 MW of Cerebras wafer-scale systems"
    },
    {
      "kind": "llm_pre_event",
      "label": "Cerebras CS-3 delivers 1,800+ tokens/sec on Llama 3.3 70B",
      "notes": "HIT — exceeds the 1,000 tok/s prediction by ~80%.",
      "source": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
      "expected_date": "2026-02-01",
      "observed_date": "2026-02-01",
      "research_origin": "deep_research",
      "measurement_criterion": "Independent benchmarks confirm Cerebras CS-3 at 1,800+ tokens/sec on Llama 3.3 70B (~10-20x GPU baseline of 50-200 tok/s)"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-OSS-120B running at 3,000 tokens/sec on Cerebras",
      "notes": "HIT — 3x the 1,000 tok/s prediction. 60x baseline GPT-5 throughput.",
      "source": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream",
      "expected_date": "2026-03-01",
      "observed_date": "2026-03-01",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI/Cerebras confirms gpt-oss-120B running at 3,000 tok/sec"
    },
    {
      "kind": "llm_post_event",
      "label": "AWS-Cerebras inference cloud collaboration GA",
      "source": "https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloud",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.8,
      "source_url": "https://press.aboutamazon.com/aws/2026/3/aws-and-cerebras-collaboration-aims-to-set-a-new-standard-for-ai-inference-speed-and-performance-in-the-cloud",
      "expected_date": "2026-06-15",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2026-09-30",
        "from": "2026-03-01"
      },
      "measurement_criterion": "AWS/Cerebras collaboration announced March 2026 reaches GA inference availability for enterprise customers"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)
... (truncated)