CMQ_013predictionAIOOM-scaling

A 5-OOM (100,000x) effective-compute leap will occur between 2024-2027 — bridging GPT-4 high-schooler to fully automated AI researcher.

Predictor: Leopold Aschenbrenner

Prior probability

55.0%

Current probability

40.2%

evolves via intake + LBP

Conviction

5/5

Signal quality

Resolution

pending

Window

2024-01-01 – 2027-10-31

Edges in / out

5 / 0

Tickers exposed

Prediction text

A 5-OOM (100,000x) effective-compute leap will occur between 2024-2027 — bridging GPT-4 high-schooler to fully automated AI researcher. | Epoch AI FLOP tracking; frontier model releases

Key catalyst: Epoch AI FLOP tracking; frontier model releases

Watch events: Frontier training FLOPs; published algorithmic-efficiency papers; emergent capability benchmarks.

Resolution evidence

Status: pending

Epoch AI tracking shows frontier training compute doubled ~every 6 months 2022-2025; algorithmic efficiency gains (MoE, distillation) material.

Predictor: Leopold Aschenbrenner

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.688

Brier

0.0417

excellent

Hits / Misses

2 / 0

of 3 resolved

Hit rate

66.7%

Calibration plot (stated vs observed)

Evidence about this node from Leopold Aschenbrenner is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class: regulatory_freeze_window

Linked via embedding similarity 0.649

All classes →

Major-country regulatory pause/moratorium on AI capability research lasting >6 months

Base rate

5.0%

0/4 historical

Inside weight

—

Outside weight

—

no pull

inside 40.2% → blend 40.2% (Δ 0.0pp)

Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.

Probability over time

9 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 40.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 2 fired ✓ · 3 overdue ⏱ · 1 pending

2024-09-11overdueQ1 window check-in (25%)
2025-05-24overdueQ2 window check-in (50%)
2026-01-31hitTest-time compute / reasoning OOM unlocked via o1, o3, R1
How: Three+ frontier reasoning models shipped (o1, o3, DeepSeek R1) demonstrating Aschenbrenner's predicted unhobbling OOM
Source: Stockalarm Pro / Dwarkesh Patel — Aschenbrenner test-time compute call validatedconf 95%
Notes: HIT — o1 (Sep 2024), o3 (late 2024/early 2025), R1 (Jan 2026) shipped. Unhobbling axis validated.
2026-02-02overdueQ3 window check-in (75%)
2026-04-30hitAschenbrenner 1GW per cluster prediction validated by 2026
How: Public confirmation of 1GW-class AI training cluster operational, validating Aschenbrenner's 'compute scaling 0.5 OOM/year' axis
Source: Stockalarm Pro 'Situational Awareness Two Years Later' — '1 GW per cluster by 2026: hit'conf 95%
Notes: HIT — 1GW cluster milestone confirmed; 10GW under construction. Compute axis on track for 5-OOM stack.
2026-06-01 → 2026-12-31pendingFrontier model demonstrates 1 full OOM effective compute over GPT-4
How: Public release of model with ≥10x effective compute vs GPT-4 (per Epoch AI FLOP estimation) — would mark cumulative ~3 OOM gain since 2024
Source: Epoch AI tracking / OpenAI, Anthropic, Google releasesconf 75%
2026-10-15pendingA 5-OOM (100,000x) effective-compute leap will occur between 2024-2027 — bridging GPT-4 high-schooler to fully automated AI researcher.
2026-09-01 → 2027-03-31pendingAlgorithmic efficiency gains tracked at 0.5+ OOM/year through 2026
How: Epoch AI or peer publication confirms algorithmic efficiency improved by ≥0.5 OOM YoY over 2025-2026 window
Source: Epoch AI compute efficiency researchconf 65%
2027-01-01 → 2027-12-31pendingFull automated AI researcher milestone — autonomous research agent
How: Frontier lab publicly demonstrates AI system performing end-to-end ML research (hypothesis, experiment, paper) at level matching mid-tier human researcher
Source: Anthropic, OpenAI, DeepMind research demonstrationsconf 40%
Notes: Cascade endpoint of Aschenbrenner thesis — high uncertainty but on his original 2027 timeline.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 40%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-24T02:00:02Z40.2%+2.1pp

Network propagation: 38.2% → 40.2%

4-iter LBP, residual 0.01000 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 806b02f8

LBP2026-05-17T02:00:01Z38.2%+4.0pp

Network propagation: 34.1% → 38.2%

5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96

LBP2026-05-10T02:00:02Z34.1%+7.4pp

Network propagation: 26.8% → 34.1%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z26.8%+11.2pp

Network propagation: 15.5% → 26.8%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

metadata_milestone_miss_sweep2026-05-02T22:07:21Z15.5%-24.9pp

metadata_milestone_miss_sweep bayesian_v2 n=3 inside=0.227 blend=0.155 LLR=-0.836 κ=0.69 w_in=0.73 regulatory_freeze_window

Raw metadata

{
  "trf": 0.3903346852891947,
  "kappa": 0.6875,
  "base_rate": 0.05,
  "predictor": "Leopold Aschenbrenner",
  "total_llr": -1.2163953243244932,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -0.3896748373344811,
  "bayes_factor": "2.3:1 against",
  "blend_reason": "blend 72% inside / 27% outside (TRF=0.390, base_rate=0.050 from regulatory_freeze_window)",
  "inside_prior": 0.4037955794452396,
  "kappa_source": "predictor_table",
  "n_milestones": 3,
  "blend_applied": true,
  "contributions": [
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6875,
      "label": "Q1 window check-in (25%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.278757261824363,
      "expected_date": "2024-09-11",
      "measurement_criterion": null
    },
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6875,
      "label": "Q2 window check-in (50%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.278757261824363,
      "expected_date": "2025-05-24",
      "measurement_criterion": null
    },
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6875,
      "label": "Q3 window check-in (75%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.278757261824363,
      "expected_date": "2026-02-02",
      "measurement_criterion": null
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "history_v2",
  "inside_weight": 0.7267657202975637,
  "outside_weight": 0.2732342797024363,
  "posterior_prob": 0.15505421390054963,
  "posterior_logit": -1.2259466228075702,
  "predictor_brier": 0.04167,
  "inside_posterior": 0.2268916488382616,
  "blended_posterior": 0.15505421390054963,
  "reference_class_id": "regulatory_freeze_window",
  "total_adjusted_llr": -0.836271785473089,
  "predictor_n_resolved": 3
}

LBP2026-04-30T16:39:51Z40.4%+7.6pp

Network propagation: 32.8% → 40.4%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

legacy v12026-04-30T16:13:50Z32.8%-7.6pp

reference_class_assigned bayesian_v2 inside=0.550 blend=0.328 w_in=0.71 regulatory_freeze_window

LBP2026-04-30T02:18:57Z40.4%+7.6pp

Network propagation: 32.8% → 40.4%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

legacy v12026-04-30T01:56:50Z32.8%-22.2pp

reference_class_assigned bayesian_v2 inside=0.550 blend=0.328 w_in=0.71 regulatory_freeze_window

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.550	+0.088

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

11 ticker(s) linked

Beneficiaries (11)

AI BBAI GTLB NVDA SOUN IBM META AMZN MSFT GOOGL ORCL

Prerequisites (5)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
correlate	S_AGI_MID_2029	AGI mid: Kurzweil 2029 path	agi_general_capability	—
correlate	S_AGI_FAST_2027	AGI fast: drop-in remote worker by 2027-09	agi_general_capability	—
correlate	S_AGI_WINTER_2036PLUS	AGI delayed: capability plateau or AI winter	agi_general_capability	—
correlate	S_AI_PAUSE_2026	Major-country AI pause beginning 2026	ai_regulatory_pause	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.699	manifold	Which Epoch AI FrontierMath open problem will be solved next?	—	mentions	pending	2026-04-23
0.654	manifold	Title:Will a Kessler Syndrome cascade begin in Low Earth Orbit before 2046?	33%	mentions	pending	2026-04-24
0.654	manifold	GPT 5.6 release date?	—	mentions	pending	2026-05-16
0.635	arxiv	FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models	—	mentions	pending	2026-05-05
0.629	arxiv	Why Do Time Series Models Need Long Context Windows?	—	mentions	pending	2026-06-01
0.629	arxiv	Uncertainty Reliability Under Domain Shift: An Investigation for Data-Driven Blood Pressure Estimation in Photoplethysmography	—	mentions	pending	2026-05-18
0.626	arxiv	QLIF-CAST: Quantum Leaky-Integrate-and-Fire for Time-Series Weather Forecasting	—	mentions	pending	2026-05-18
0.623	arxiv	Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion	—	mentions	pending	2026-05-13
0.621	arxiv	DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data	—	mentions	pending	2026-05-18
0.618	arxiv	Adaptive Oscillatory-State Alignment for Time Series Forecasting	—	mentions	pending	2026-06-04

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "5 OOMs (100,000x)",
  "mode": "FORECAST",
  "role": "Cited-Researcher",
  "caveats": "Unhobbling gains hardest to quantify; algorithmic efficiency may saturate.",
  "context": "Same size jump as GPT-2 → GPT-4 (5 OOM). Derived from three vectors: physical compute (~0.5 OOM/yr), algorithmic efficiency (~0.5 OOM/yr), and unhobbling (RLHF, CoT, tools, memory).",
  "to_year": 2027,
  "conv_cues": "specific quantitative target; derivation provided",
  "direction": "NUMERIC_TARGET",
  "from_year": 2024,
  "timeframe": "2024-2027",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -6,
      "source_id": null,
      "expected_date": "2024-09-11",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -5,
      "source_id": null,
      "expected_date": "2025-05-24",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "llm_pre_event",
      "label": "Test-time compute / reasoning OOM unlocked via o1, o3, R1",
      "notes": "HIT — o1 (Sep 2024), o3 (late 2024/early 2025), R1 (Jan 2026) shipped. Unhobbling axis validated.",
      "source": "Stockalarm Pro / Dwarkesh Patel — Aschenbrenner test-time compute call validated",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -4,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://www.dwarkesh.com/p/leopold-aschenbrenner",
      "expected_date": "2026-01-31",
      "observed_date": "2026-01-31",
      "research_origin": "deep_research",
      "measurement_criterion": "Three+ frontier reasoning models shipped (o1, o3, DeepSeek R1) demonstrating Aschenbrenner's predicted unhobbling OOM"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2026-02-02",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "llm_pre_event",
      "label": "Aschenbrenner 1GW per cluster prediction validated by 2026",
      "notes": "HIT — 1GW cluster milestone confirmed; 10GW under construction. Compute axis on track for 5-OOM stack.",
      "source": "Stockalarm Pro 'Situational Awareness Two Years Later' — '1 GW per cluster by 2026: hit'",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -2,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://pro.stockalarm.io/blog/situational-awareness-two-years-later",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-30",
      "research_origin": "deep_research",
      "measurement_criterion": "Public confirmation of 1GW-class AI training cluster operational, validating Aschenbrenner's 'compute scaling 0.5 OOM/year' axis"
    },
    {
      "kind": "llm_pre_event",
      "label": "Frontier model demonstrates 1 full OOM effective compute over GPT-4",
      "source": "Epoch AI tracking / OpenAI, Anthropic, Google releases",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -1,
      "source_id": null,
      "confidence": 0.75,
      "source_url": "https://epochai.org/data/notable-ai-models",
      "expected_date": "2026-09-15",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-12-31",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Public release of model with ≥10x effective compute vs GPT-4 (per Epoch AI FLOP estima
... (truncated)