AI_004predictionAIsmarter-than-smartest-human

AI will be 'smarter than the smartest human' by 2026, driven by fully automated recursive self-improvement loops compounding cognitive gains without human intervention.

Predictor: Elon Musk

Prior probability

25.0%

Current probability

16.5%

evolves via intake + LBP

Conviction

5/5

Signal quality

Resolution

pending

Window

2026-01-01 – 2026-11-30

Edges in / out

1 / 0

Tickers exposed

Prediction text

AI will be 'smarter than the smartest human' by 2026, driven by fully automated recursive self-improvement loops compounding cognitive gains without human intervention. | Frontier-model generalist benchmark sweep

Key catalyst: Frontier-model generalist benchmark sweep

Watch events: Grok 5 / GPT-6 benchmark disclosures; recursive-self-improvement demos

Resolution evidence

Status: pending

xAI Grok 4.2, OpenAI o-series, Claude Opus 4.7 all match or exceed expert humans on narrow benchmarks (IMO, USAMO, FrontierMath, GDPval); generalist-single-human-parity more contested.

Predictor: Elon Musk

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.688

Brier

0.0142

excellent

Hits / Misses

1 / 0

of 3 resolved

Hit rate

33.3%

Calibration plot (stated vs observed)

Evidence about this node from Elon Musk is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

1 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 16.5%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 2 fired ✓ · 2 overdue ⏱ · 3 pending

2026-02-22overdueQ1 window check-in (25%)
2026-03-05hitMajor model release exceeds human OSWorld baseline by clear margin
How: Frontier AI model exceeds 72.4% human-expert OSWorld baseline by >=2pp, signaling superhuman computer-use capability
Source: https://nerdleveltech.com/gpt-5-4-beats-humans-computer-use-ai-agents — GPT-5.4 hits 75.0%conf 95%
2026-04-16overdueQ2 window check-in (50%)
2026-04-29hitHolo3-35B-A3B leads OSWorld with 82.6%
How: OSWorld-Verified leaderboard shows top model at >=82% accuracy, exceeding human baseline by >=10pp
Source: https://benchlm.ai/benchmarks/osWorldVerified — Holo3-35B-A3B 82.6%conf 92%
2026-06-07pendingQ3 window check-in (75%)
2026-06-30pendingAnthropic CEO publicly maintains 2026 AI > most humans timeline
How: Dario Amodei reaffirms in published interview/essay that AI surpasses human intelligence in most domains by end-2026 or early-2027
Source: https://www.bloomberg.com/news/newsletters/2024-10-18/anthropic-ceo-thinks-ai-may-outsmart-most-humans-as-soon-as-2026 — Amodei 2026 timelineconf 80%
2026-05-01 → 2026-09-30pendingOpenAI articulates 'hundreds of thousands of automated research interns' plan
How: OpenAI executive publicly outlines path to 100K+ automated research agents within 9 months, indicating material progress on RSI loop
Source: https://openai.com/index/next-phase-of-enterprise-ai/ — OpenAI roadmapconf 70%
2026-07-30pendingAI will be 'smarter than the smartest human' by 2026, driven by fully automated recursive self-improvement loops compounding cognitive gains
2026-09-01 → 2026-11-30pendingFrontier benchmark composite (MMLU/SWE/MATH/OSWorld) exceeds expert humans on 4 of 4
How: At least one frontier model exceeds expert-human baseline on 4 of 4 standard capability benchmarks (MMLU, SWE-Bench Verified, MATH, OSWorld)
Source: https://hai.stanford.edu/ai-index/2026-ai-index-report/technical-performance — Stanford AI Index 2026conf 60%
2026-10-01 → 2027-06-30pendingCascade: AGI-popular-narrative drives capex acceleration past $700B/yr
How: Hyperscaler combined annual AI capex exceeds $700B run-rate as direct response to claimed AGI/ASI capability milestones
Source: https://aijourn.com/700-billion-ai-capex-in-2026-following-the-capital-flows-from-hyperscalers-to-chipmakers/ — $700B 2026 capexconf 65%

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 17%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

metadata_milestone_miss_sweep2026-05-02T22:07:21Z16.5%-8.5pp

metadata_milestone_miss_sweep bayesian_v2 n=2 inside=0.165 blend=0.165 LLR=-0.521 κ=0.64 no_blend

Raw metadata

{
  "trf": 0.6338685427014515,
  "kappa": 0.6429,
  "base_rate": null,
  "predictor": "Elon Musk",
  "total_llr": -0.8109302162163288,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -1.0986122886681098,
  "bayes_factor": "1.7:1 against",
  "blend_reason": "no reference_class linked",
  "inside_prior": 0.25,
  "kappa_source": "predictor_table",
  "n_milestones": 2,
  "blend_applied": false,
  "contributions": [
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6429,
      "label": "Q1 window check-in (25%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.2606735180027389,
      "expected_date": "2026-02-22",
      "measurement_criterion": null
    },
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6429,
      "label": "Q2 window check-in (50%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.2606735180027389,
      "expected_date": "2026-04-16",
      "measurement_criterion": null
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "prior_prob",
  "inside_weight": 0.5562920201089838,
  "outside_weight": 0.4437079798910162,
  "posterior_prob": 0.1652104798916139,
  "posterior_logit": -1.6199593246735877,
  "predictor_brier": 0.01,
  "inside_posterior": 0.1652104798916139,
  "blended_posterior": 0.1652104798916139,
  "reference_class_id": null,
  "total_adjusted_llr": -0.5213470360054778,
  "predictor_n_resolved": 2
}

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

No propagation data yet. Run inference/.venv/bin/python scripts/ops/run_loopy_belief_propagation.py on the droplet, or wait for the Sunday 02:00 UTC weekly cron.

Ticker exposure

1 ticker(s) linked

Beneficiaries (1)

GOOGL

Prerequisites (1)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
correlate	S_AI_PAUSE_2026	Major-country AI pause beginning 2026	ai_regulatory_pause	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.675	manifold	Will "How Go Players Disempower Themselves to AI" make the top fifty posts in LessWrong's 2026 Annual Review?	34%	mentions	pending	2026-05-02
0.656	gdelt	intelligence trust the equation that will decide australias ai winners 625399	—	mentions	pending	2026-04-30
0.654	manifold	Will "Cognitive Security as an AI Safety Cause Area" make the top fifty posts in LessWrong's 2026 Annual Review?	11%	mentions	pending	2026-05-26
0.634	arxiv	Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection	—	mentions	pending	2026-06-04
0.630	manifold	Will an AI-generated movie hit Netflix Top 10 in 2026?	—	mentions	pending	2026-05-18
0.621	manifold	Will "Natural Language Autoencoders Produce Unsuper..." make the top fifty posts in LessWrong's 2026 Annual Review?	37%	mentions	pending	2026-05-07
0.614	manifold	Will "Protecting Cognitive Integrity: Our internal ..." make the top fifty posts in LessWrong's 2026 Annual Review?	15%	mentions	pending	2026-04-26
0.581	manifold	Will "Mnemonic portraits for 19,023 human genes" make the top fifty posts in LessWrong's 2026 Annual Review?	18%	mentions	pending	2026-05-29
0.577	manifold	Chess: Winner of GCT Super Rapid & Blitz 2026	—	mentions	pending	2026-05-05
0.554	manifold	Will "Claude, Author of the Humanitas" make the top fifty posts in LessWrong's 2026 Annual Review?	17%	mentions	pending	2026-05-27

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "smartest-human-parity",
  "mode": "FORECAST",
  "role": "Cited-CEO",
  "context": "Distinct from INF_073 (AI smarter than all humanity combined by 2030/2031) — this is the lower-bound 2026 prediction that any-individual-human-parity is reached first. Musk has repeatedly revised this forward from 2028 (2022) to 2026 (2024).",
  "to_year": 2026,
  "conv_cues": "CEO FIRST_PERSON; specific year; superlative",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "2026",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -7,
      "source_id": null,
      "expected_date": "2026-02-22",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "llm_pre_event",
      "label": "Major model release exceeds human OSWorld baseline by clear margin",
      "source": "https://nerdleveltech.com/gpt-5-4-beats-humans-computer-use-ai-agents — GPT-5.4 hits 75.0%",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://nerdleveltech.com/gpt-5-4-beats-humans-computer-use-ai-agents",
      "expected_date": "2026-03-05",
      "observed_date": "2026-03-05",
      "research_origin": "deep_research",
      "measurement_criterion": "Frontier AI model exceeds 72.4% human-expert OSWorld baseline by >=2pp, signaling superhuman computer-use capability"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -5,
      "source_id": null,
      "expected_date": "2026-04-16",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "llm_pre_event",
      "label": "Holo3-35B-A3B leads OSWorld with 82.6%",
      "source": "https://benchlm.ai/benchmarks/osWorldVerified — Holo3-35B-A3B 82.6%",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -4,
      "source_id": null,
      "confidence": 0.92,
      "source_url": "https://benchlm.ai/benchmarks/osWorldVerified",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29",
      "research_origin": "deep_research",
      "measurement_criterion": "OSWorld-Verified leaderboard shows top model at >=82% accuracy, exceeding human baseline by >=10pp"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2026-06-07",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "Anthropic CEO publicly maintains 2026 AI > most humans timeline",
      "source": "https://www.bloomberg.com/news/newsletters/2024-10-18/anthropic-ceo-thinks-ai-may-outsmart-most-humans-as-soon-as-2026 — Amodei 2026 timeline",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -2,
      "source_id": null,
      "confidence": 0.8,
      "source_url": "https://www.bloomberg.com/news/newsletters/2024-10-18/anthropic-ceo-thinks-ai-may-outsmart-most-humans-as-soon-as-2026",
      "expected_date": "2026-06-30",
      "research_origin": "deep_research",
      "measurement_criterion": "Dario Amodei reaffirms in published interview/essay that AI surpasses human intelligence in most domains by end-2026 or early-2027"
    },
    {
      "kind": "llm_pre_event",
      "label": "OpenAI articulates 'hundreds of thousands of automated research interns' plan",
      "source": "https://openai.com/index/next-phase-of-enterprise-ai/ — OpenAI roadmap",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -1,
      "source_id": null,
      "confidence": 0.
... (truncated)