SEM_002predictionAIAI-scaling

By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs).

Predictor: Leopold Aschenbrenner

Prior probability

75.0%

Current probability

55.8%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

partial

Window

2025-01-01 – 2026-09-30

Edges in / out

9 / 5

Tickers exposed

Prediction text

By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs). | hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania | Frontier model benchmark releases

Key catalyst: Frontier model benchmark releases

Watch events: Next-gen model benchmark scores (GPQA, HLE, SWE-Bench); agentic reliability metrics

Verbatim quote

From episode "Forecasts and Strategic Vectors in the Global Semiconductor and Compute Manufacturing Ecosystem (2023-2026)"

hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania

Resolution evidence

Status: partial

GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.

Predictor: Leopold Aschenbrenner

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.688

Brier

0.0417

excellent

Hits / Misses

2 / 0

of 3 resolved

Hit rate

66.7%

Calibration plot (stated vs observed)

Evidence about this node from Leopold Aschenbrenner is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class: regulatory_freeze_window

Linked via embedding similarity 0.635

All classes →

Major-country regulatory pause/moratorium on AI capability research lasting >6 months

Base rate

5.0%

0/4 historical

Inside weight

0.856

TRF=0.21

Outside weight

0.144

pulling toward base rate

inside 68.3% → blend 55.8% (Δ -12.5pp)

Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.

Probability over time

6 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 55.8%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 3 overdue ⏱

2025-05-02overdueQ1 window check-in (25%)
2025-08-31overdueQ2 window check-in (50%)
2025-12-30overdueQ3 window check-in (75%)
2026-05-01partialBy 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs).
2026-01-01 → 2026-09-30pendingGPT-5 / Claude Opus 5 release with claimed PhD-level reasoning
How: OpenAI or Anthropic releases successor model claiming PhD-level performance on at least 3 expert benchmarks (GPQA, MMLU, HumanEval, SWE-Bench, FrontierMath)
Source: Anthropic blog, OpenAI blog, conference keynotesconf 85%
Notes: By April 2026, OpenAI's GPT-5.2 already demonstrating physics breakthroughs (per SEM_033 research).
2026-01-01 → 2026-09-30pendingMMLU saturated (≥95%) by all frontier models
How: Top 5 frontier models (Anthropic, OpenAI, DeepMind, Meta, DeepSeek) all score ≥95% on MMLU — saturation marks 'better than college-grad' threshold
Source: Papers With Code MMLU leaderboardconf 85%
Notes: GPT-4 Turbo at ~88%, Claude 3.5 Opus ~88-91%; saturation by 2026 likely already happened.
2026-04-01 → 2026-12-31pendingGPQA-Diamond benchmark crosses 90% by frontier model
How: Frontier model achieves ≥90% on GPQA-Diamond (Graduate-level Physics Q&A, designed to be unsolvable by non-experts)
Source: Papers With Code GPQA leaderboard, Anthropic/OpenAI evals pagesconf 70%
2026-06-01 → 2027-06-30pendingAschenbrenner (or peer) publishes 'Situational Awareness II' or similar treatise marking AGI threshold
How: Influential AI researcher (Aschenbrenner, Sutskever, Amodei, peer) publishes essay or book arguing AGI threshold has been crossed
Source: situational-awareness.ai, research lab blogsconf 50%
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2030-09-27pendingMost large companies' business models will be disrupted in 2-5 years
2063-06-21pendingPeter's 14-year-old son Milan will never get a driver's license.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 56%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

intake_event_update2026-05-21T23:15:16Z55.8%-1.4pp

intake:7afeeb9a-f217-4dd2-b910-24ff14bdfc39 bayesian_v2 inside=0.683 blend=0.558 LLR=0.477 κ=0.69 w_in=0.86 regulatory_freeze_window

Raw metadata

{
  "trf": 0.2057002584634281,
  "kappa": 0.6875,
  "base_rate": 0.05,
  "predictor": "Leopold Aschenbrenner",
  "total_llr": 0.6931471805599453,
  "bayesian_v2": true,
  "prior_logit": 0.2925896353230114,
  "bayes_factor": "1.6:1 favoring",
  "blend_reason": "blend 86% inside / 14% outside (TRF=0.206, base_rate=0.050 from regulatory_freeze_window)",
  "inside_prior": 0.57263,
  "kappa_source": "predictor_table",
  "blend_applied": true,
  "contributions": [
    {
      "llr": 0.6931471805599453,
      "kappa": 0.6875,
      "label": "Frontier model benchmarks (GPT-5.5, Claude Mythos, Gemini 3.1) clearly past college-grad threshold in technical domains.",
      "adjusted_llr": 0.4765386866349624
    }
  ],
  "evidence_kind": "intake_event_update",
  "inside_source": "history_v2",
  "inside_weight": 0.8560098190756003,
  "outside_weight": 0.1439901809243997,
  "posterior_prob": 0.5583358951206584,
  "evidence_origin": "daily_intake",
  "llm_suggestions": [
    {
      "polarity": "corroborates",
      "status_change": "unchanged",
      "evidence_strength": "moderate",
      "delta_prob_suggestion": 0.05
    }
  ],
  "posterior_logit": 0.7691283219579738,
  "predictor_brier": 0.04167,
  "evidence_doc_ids": [],
  "inside_posterior": 0.6833323021138943,
  "blended_posterior": 0.5583358951206584,
  "reference_class_id": "regulatory_freeze_window",
  "total_adjusted_llr": 0.4765386866349624,
  "predictor_n_resolved": 3
}

resolution_terminal2026-05-01T00:00:00Z50.0%-7.3pp

resolution_terminal partial outcome=0.5 pre_resolution=0.573

Raw metadata

{
  "source": "backfill_resolution_history.py",
  "status": "partial",
  "bayesian_v2": false,
  "outcome_prob": 0.5,
  "evidence_kind": "resolution_terminal",
  "posterior_prob": 0.5,
  "delta_to_outcome": -0.07262999999999997,
  "inside_posterior": 0.57263,
  "validation_notes": "GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.",
  "validation_status": "hit",
  "pre_resolution_prob": 0.57263,
  "resolution_evidence": "GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.",
  "does_not_update_current_prob": true
}

LBP2026-04-30T16:39:51Z57.3%+3.5pp

Network propagation: 53.7% → 57.3%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

legacy v12026-04-30T16:13:50Z53.7%-3.5pp

reference_class_assigned bayesian_v2 inside=0.750 blend=0.537 w_in=0.77 regulatory_freeze_window

LBP2026-04-30T02:18:57Z57.2%+3.5pp

Network propagation: 53.7% → 57.2%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

legacy v12026-04-30T01:56:50Z53.7%-21.3pp

reference_class_assigned bayesian_v2 inside=0.750 blend=0.537 w_in=0.76 regulatory_freeze_window

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.750	+0.122
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.750	+0.108
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.750	+0.087
killer	TK09 Energy Grid Cap (Data Center Power Wall)	35.0%	0.050	0.750	-0.053
killer	TK05 Rate Regime Persistence (10y > 5% through 2028)	30.0%	0.050	0.750	-0.018

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	+0.058
prereq	244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis	48.4%	0.920	0.050	+0.051
prereq	230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis	34.7%	0.650	0.050	+0.038
prereq	242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis	36.1%	0.650	0.050	+0.024
prereq	247_023 AI will be able to do everything a white collar worker does — Dave Blundin	40.8%	0.720	0.050	+0.016

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (9)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
correlate	S_AGI_MID_2029	AGI mid: Kurzweil 2029 path	agi_general_capability	—
correlate	S_COMPUTE_100GW_2030	Compute: 100GW national-scale by Dec 2030	compute_scale	—
correlate	S_AGI_WINTER_2036PLUS	AGI delayed: capability plateau or AI winter	agi_general_capability	—
correlate	S_AI_PAUSE_2026	Major-country AI pause beginning 2026	ai_regulatory_pause	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (5)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	244_019	Peter's son won't need a driver's license in 2 years	Auto/Transport	—
prereq	247_023	AI will be able to do everything a white collar worker does imminently	AI	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	242_031	Most large companies' business models will be disrupted in 2-5 years	Markets/Stocks	—
prereq	230_020	Peter's 14-year-old son Milan will never get a driver's license.	Auto/Transport	—

Validations (1)

Resolution events

Observed at	Status	By	Notes
2026-04-29	hit	thesis_timeline_v1.0_import	GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.720	manifold	June 2026 AI model releases	—	mentions	pending	2026-05-28
0.715	manifold	May 2026 AI model releases	—	mentions	pending	2026-04-30
0.704	manifold	Will any China-domestic AI chip reach ≥80% of NVIDIA H100 perf on a public benchmark before 2026-12-31?	14%	mentions	pending	2026-05-04
0.690	arxiv	Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation	—	mentions	pending	2026-05-05
0.671	arxiv	Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts	—	mentions	pending	2026-05-28
0.645	manifold	What will I get in 2026 IMO	—	mentions	pending	2026-06-04
0.634	arxiv	GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction	—	mentions	pending	2026-05-13
0.618	arxiv	Evaluating Skill and Stability of ArchesWeather and ArchesWeatherGen under Multi-Decadal Climate Simulations	—	mentions	pending	2026-05-28
0.606	arxiv	Intercomparison of Machine Learning Algorithms for Remote Sensing-based In-season Crop Mapping	—	mentions	pending	2026-06-04
0.604	gdelt	394371	—	mentions	pending	2026-04-30

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "mode": "PREDICTION",
  "role": "Guest-VC/Researcher",
  "context": "Aschenbrenner forecasts machine-model output outpacing college-grad cognition, powered by GPU swarms on Nevada solar farms and Pennsylvania shale fields.",
  "to_year": 2026,
  "verbatim": "hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania",
  "conv_cues": "drives; mathematically reflects",
  "direction": "HAPPEN",
  "from_year": 2025,
  "timeframe": "2025-2026",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2025-05-02",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -2,
      "source_id": null,
      "expected_date": "2025-08-31",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -1,
      "source_id": null,
      "expected_date": "2025-12-30",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "event",
      "label": "By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs).",
      "status": "partial",
      "weight": 1,
      "ordinal": 0,
      "source_id": "SEM_002",
      "expected_date": "2026-05-01",
      "observed_date": "2026-05-01"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-5 / Claude Opus 5 release with claimed PhD-level reasoning",
      "notes": "By April 2026, OpenAI's GPT-5.2 already demonstrating physics breakthroughs (per SEM_033 research).",
      "source": "Anthropic blog, OpenAI blog, conference keynotes",
      "status": "pending",
      "weight": 0.4,
      "ordinal": 1,
      "source_id": null,
      "confidence": 0.85,
      "expected_date": "2026-05-17",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-09-30",
        "from": "2026-01-01"
      },
      "measurement_criterion": "OpenAI or Anthropic releases successor model claiming PhD-level performance on at least 3 expert benchmarks (GPQA, MMLU, HumanEval, SWE-Bench, FrontierMath)"
    },
    {
      "kind": "llm_pre_event",
      "label": "MMLU saturated (≥95%) by all frontier models",
      "notes": "GPT-4 Turbo at ~88%, Claude 3.5 Opus ~88-91%; saturation by 2026 likely already happened.",
      "source": "Papers With Code MMLU leaderboard",
      "status": "pending",
      "weight": 0.4,
      "ordinal": 2,
      "source_id": null,
      "confidence": 0.85,
      "expected_date": "2026-05-17",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-09-30",
        "from": "2026-01-01"
      },
      "measurement_criterion": "Top 5 frontier models (Anthropic, OpenAI, DeepMind, Meta, DeepSeek) all score ≥95% on MMLU — saturation marks 'better than college-grad' threshold"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPQA-Diamond benchmark crosses 90% by frontier model",
      "source": "Papers With Code GPQA leaderboard, Anthropic/OpenAI evals pages",
      "status": "pending",
      "weight": 0.4,
      "ordinal": 3,
      "source_id": null,
      "confidence": 0.7,
      "expected_date": "2026-08-16",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-1
... (truncated)