By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs).
Predictor: Leopold Aschenbrenner
Prediction text
By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs). | hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania | Frontier model benchmark releases
Key catalyst: Frontier model benchmark releases
Watch events: Next-gen model benchmark scores (GPQA, HLE, SWE-Bench); agentic reliability metrics
Verbatim quote
hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania
Resolution evidence
GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.
Predictor: Leopold Aschenbrenner
Calibration plot (stated vs observed)
Evidence about this node from Leopold Aschenbrenner is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class: regulatory_freeze_window
Major-country regulatory pause/moratorium on AI capability research lasting >6 months
Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.
Probability over time
Milestone chain
- 2025-05-02overdueQ1 window check-in (25%)
- 2025-08-31overdueQ2 window check-in (50%)
- 2025-12-30overdueQ3 window check-in (75%)
- 2026-01-01 → 2026-09-30pendingGPT-5 / Claude Opus 5 release with claimed PhD-level reasoningHow: OpenAI or Anthropic releases successor model claiming PhD-level performance on at least 3 expert benchmarks (GPQA, MMLU, HumanEval, SWE-Bench, FrontierMath)Source: Anthropic blog, OpenAI blog, conference keynotesconf 85%Notes: By April 2026, OpenAI's GPT-5.2 already demonstrating physics breakthroughs (per SEM_033 research).
- 2026-01-01 → 2026-09-30pendingMMLU saturated (≥95%) by all frontier modelsHow: Top 5 frontier models (Anthropic, OpenAI, DeepMind, Meta, DeepSeek) all score ≥95% on MMLU — saturation marks 'better than college-grad' thresholdSource: Papers With Code MMLU leaderboardconf 85%Notes: GPT-4 Turbo at ~88%, Claude 3.5 Opus ~88-91%; saturation by 2026 likely already happened.
- 2026-04-01 → 2026-12-31pendingGPQA-Diamond benchmark crosses 90% by frontier modelHow: Frontier model achieves ≥90% on GPQA-Diamond (Graduate-level Physics Q&A, designed to be unsolvable by non-experts)Source: Papers With Code GPQA leaderboard, Anthropic/OpenAI evals pagesconf 70%
- 2026-06-01 → 2027-06-30pendingAschenbrenner (or peer) publishes 'Situational Awareness II' or similar treatise marking AGI thresholdHow: Influential AI researcher (Aschenbrenner, Sutskever, Amodei, peer) publishes essay or book arguing AGI threshold has been crossedSource: situational-awareness.ai, research lab blogsconf 50%
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Raw metadata
{
"trf": 0.2057002584634281,
"kappa": 0.6875,
"base_rate": 0.05,
"predictor": "Leopold Aschenbrenner",
"total_llr": 0.6931471805599453,
"bayesian_v2": true,
"prior_logit": 0.2925896353230114,
"bayes_factor": "1.6:1 favoring",
"blend_reason": "blend 86% inside / 14% outside (TRF=0.206, base_rate=0.050 from regulatory_freeze_window)",
"inside_prior": 0.57263,
"kappa_source": "predictor_table",
"blend_applied": true,
"contributions": [
{
"llr": 0.6931471805599453,
"kappa": 0.6875,
"label": "Frontier model benchmarks (GPT-5.5, Claude Mythos, Gemini 3.1) clearly past college-grad threshold in technical domains.",
"adjusted_llr": 0.4765386866349624
}
],
"evidence_kind": "intake_event_update",
"inside_source": "history_v2",
"inside_weight": 0.8560098190756003,
"outside_weight": 0.1439901809243997,
"posterior_prob": 0.5583358951206584,
"evidence_origin": "daily_intake",
"llm_suggestions": [
{
"polarity": "corroborates",
"status_change": "unchanged",
"evidence_strength": "moderate",
"delta_prob_suggestion": 0.05
}
],
"posterior_logit": 0.7691283219579738,
"predictor_brier": 0.04167,
"evidence_doc_ids": [],
"inside_posterior": 0.6833323021138943,
"blended_posterior": 0.5583358951206584,
"reference_class_id": "regulatory_freeze_window",
"total_adjusted_llr": 0.4765386866349624,
"predictor_n_resolved": 3
}Raw metadata
{
"source": "backfill_resolution_history.py",
"status": "partial",
"bayesian_v2": false,
"outcome_prob": 0.5,
"evidence_kind": "resolution_terminal",
"posterior_prob": 0.5,
"delta_to_outcome": -0.07262999999999997,
"inside_posterior": 0.57263,
"validation_notes": "GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.",
"validation_status": "hit",
"pre_resolution_prob": 0.57263,
"resolution_evidence": "GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks.",
"does_not_update_current_prob": true
}Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| killer | TK03 AI Regulatory Moratorium (EU/US Capability Freeze) | 10.0% | 0.050 | 0.750 | +0.122 |
| killer | TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption) | 12.0% | 0.050 | 0.750 | +0.108 |
| killer | TK01 AGI Capability Plateau (2026-27 Training Stall) | 15.0% | 0.050 | 0.750 | +0.087 |
| killer | TK09 Energy Grid Cap (Data Center Power Wall) | 35.0% | 0.050 | 0.750 | -0.053 |
| killer | TK05 Rate Regime Persistence (10y > 5% through 2028) | 30.0% | 0.050 | 0.750 | -0.018 |
Top outgoing (children)
Predictions THIS node influences
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis | 35.5% | 0.700 | 0.050 | +0.058 |
| prereq | 244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis | 48.4% | 0.920 | 0.050 | +0.051 |
| prereq | 230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis | 34.7% | 0.650 | 0.050 | +0.038 |
| prereq | 242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis | 36.1% | 0.650 | 0.050 | +0.024 |
| prereq | 247_023 AI will be able to do everything a white collar worker does — Dave Blundin | 40.8% | 0.720 | 0.050 | +0.016 |
Ticker exposure
Beneficiaries (24)
Adverse (6)
Prerequisites (9)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| correlate | S_AGI_MID_2029 | AGI mid: Kurzweil 2029 path | agi_general_capability | — |
| correlate | S_COMPUTE_100GW_2030 | Compute: 100GW national-scale by Dec 2030 | compute_scale | — |
| correlate | S_AGI_WINTER_2036PLUS | AGI delayed: capability plateau or AI winter | agi_general_capability | — |
| correlate | S_AI_PAUSE_2026 | Major-country AI pause beginning 2026 | ai_regulatory_pause | — |
| killer | TK09 | Energy Grid Cap (Data Center Power Wall) | — | — |
| killer | TK05 | Rate Regime Persistence (10y > 5% through 2028) | — | — |
| killer | TK01 | AGI Capability Plateau (2026-27 Training Stall) | — | — |
| killer | TK02 | AI Compute Supply Shock (TSMC/Taiwan Disruption) | — | — |
| killer | TK03 | AI Regulatory Moratorium (EU/US Capability Freeze) | — | — |
Dependents (5)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | 244_019 | Peter's son won't need a driver's license in 2 years | Auto/Transport | — |
| prereq | 247_023 | AI will be able to do everything a white collar worker does imminently | AI | — |
| prereq | 232_055 | We're exiting the industrial age permanently as recursive self-improvement unfolds. | AI | — |
| prereq | 242_031 | Most large companies' business models will be disrupted in 2-5 years | Markets/Stocks | — |
| prereq | 230_020 | Peter's 14-year-old son Milan will never get a driver's license. | Auto/Transport | — |
Validations (1)
| Observed at | Status | By | Notes |
|---|---|---|---|
| 2026-04-29 | hit | thesis_timeline_v1.0_import | GPT-5/Claude Opus 4.x/Gemini 3 already outperform median college grads on MMLU/GPQA/HLE by late 2025. Aschenbrenner thesis largely vindicated on knowledge-work benchmarks. |
Linked documents (10)
Raw metadata
{
"nia": false,
"mode": "PREDICTION",
"role": "Guest-VC/Researcher",
"context": "Aschenbrenner forecasts machine-model output outpacing college-grad cognition, powered by GPU swarms on Nevada solar farms and Pennsylvania shale fields.",
"to_year": 2026,
"verbatim": "hundreds of millions of Graphic Processing Units (GPUs) humming across vast solar farms in Nevada and shale fields in Pennsylvania",
"conv_cues": "drives; mathematically reflects",
"direction": "HAPPEN",
"from_year": 2025,
"timeframe": "2025-2026",
"conv_level": "HIGH",
"milestones": [
{
"kind": "quartile_checkpoint",
"label": "Q1 window check-in (25%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -3,
"source_id": null,
"expected_date": "2025-05-02",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "quartile_checkpoint",
"label": "Q2 window check-in (50%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -2,
"source_id": null,
"expected_date": "2025-08-31",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "quartile_checkpoint",
"label": "Q3 window check-in (75%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -1,
"source_id": null,
"expected_date": "2025-12-30",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "event",
"label": "By 2025-2026, AI model outputs will outpace the cognitive capabilities of college graduates (driven by hundreds of millions of GPUs).",
"status": "partial",
"weight": 1,
"ordinal": 0,
"source_id": "SEM_002",
"expected_date": "2026-05-01",
"observed_date": "2026-05-01"
},
{
"kind": "llm_pre_event",
"label": "GPT-5 / Claude Opus 5 release with claimed PhD-level reasoning",
"notes": "By April 2026, OpenAI's GPT-5.2 already demonstrating physics breakthroughs (per SEM_033 research).",
"source": "Anthropic blog, OpenAI blog, conference keynotes",
"status": "pending",
"weight": 0.4,
"ordinal": 1,
"source_id": null,
"confidence": 0.85,
"expected_date": "2026-05-17",
"research_origin": "training",
"expected_date_range": {
"to": "2026-09-30",
"from": "2026-01-01"
},
"measurement_criterion": "OpenAI or Anthropic releases successor model claiming PhD-level performance on at least 3 expert benchmarks (GPQA, MMLU, HumanEval, SWE-Bench, FrontierMath)"
},
{
"kind": "llm_pre_event",
"label": "MMLU saturated (≥95%) by all frontier models",
"notes": "GPT-4 Turbo at ~88%, Claude 3.5 Opus ~88-91%; saturation by 2026 likely already happened.",
"source": "Papers With Code MMLU leaderboard",
"status": "pending",
"weight": 0.4,
"ordinal": 2,
"source_id": null,
"confidence": 0.85,
"expected_date": "2026-05-17",
"research_origin": "training",
"expected_date_range": {
"to": "2026-09-30",
"from": "2026-01-01"
},
"measurement_criterion": "Top 5 frontier models (Anthropic, OpenAI, DeepMind, Meta, DeepSeek) all score ≥95% on MMLU — saturation marks 'better than college-grad' threshold"
},
{
"kind": "llm_pre_event",
"label": "GPQA-Diamond benchmark crosses 90% by frontier model",
"source": "Papers With Code GPQA leaderboard, Anthropic/OpenAI evals pages",
"status": "pending",
"weight": 0.4,
"ordinal": 3,
"source_id": null,
"confidence": 0.7,
"expected_date": "2026-08-16",
"research_origin": "training",
"expected_date_range": {
"to": "2026-1
... (truncated)