INF_073predictionAIAI-smarter-than-humanity-2030

AI will become smarter than any single human by end of 2025 or 2026, and smarter than all of humanity combined by 2030 or 2031.

Predictor: Elon Musk

Prior probability

30.0%

Current probability

48.2%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2025-01-01 – 2031-08-31

Edges in / out

3 / 0

Tickers exposed

Prediction text

AI will become smarter than any single human by end of 2025 or 2026, and smarter than all of humanity combined by 2030 or 2031. | Grok 5 or Gemini 4 capability evaluation

Key catalyst: Grok 5 or Gemini 4 capability evaluation

Watch events: Grok 5+ capability disclosures; frontier benchmark ensembles

Resolution evidence

Status: pending

xAI Grok 4.2, Colossus 200K-GPU training runs. Frontier models approach but do not surpass human-cognition benchmarks in generalist tasks as of Apr 2026.

Predictor: Elon Musk

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.688

Brier

0.0142

excellent

Hits / Misses

1 / 0

of 3 resolved

Hit rate

33.3%

Calibration plot (stated vs observed)

Evidence about this node from Elon Musk is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 48.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 2 fired ✓ · 1 overdue ⏱ · 6 pending

2024-12-20hitOpenAI o3 surpasses 87.5% on ARC-AGI (above human baseline)
How: OpenAI o3 publishes ARC-AGI score 85% or higher (human baseline) on official leaderboard
Source: https://www.veracalloway.com/blog/ai-culture/agi-timeline/ — o3 hit 87.5% Dec 2024conf 99%
Notes: First clear single-domain superhuman benchmark. Necessary but not sufficient for 'smarter than any single human'.
2026-02-23overdueQ1 window check-in (25%)
2026-03-24hitGPT-5.5 (Spud) completes pretraining (March 2026)
How: OpenAI internal communications / leaked reporting confirms GPT-5.5 pretraining completion in March 2026
Source: https://www.veracalloway.com/blog/ai-culture/agi-timeline/conf 85%
2026-06-01 → 2027-12-31pendingFrontier model passes adversarial Turing test (high-quality, 2-hour panel)
How: Independent academic study (Stanford/MIT/Oxford) confirms frontier model passes 2-hour adversarial Turing test with expert judges and 50% pass rate or higher
Source: Manifold market reference at https://aimultiple.com/artificial-general-intelligence-singularity-timingconf 45%
2026-06-01 → 2027-12-31pendingAI matches or exceeds human expert performance across 5+ professional domains simultaneously
How: Independent study (Apollo/METR/AI Index) shows frontier model matches expert humans in 5+ distinct professional benchmarks (medical diagnosis, legal reasoning, programming, mathematical proof, scientific writing)
Source: Stanford AI Index annual reportconf 50%
2027-04-18pendingQ2 window check-in (50%)
2028-06-09pendingQ3 window check-in (75%)
2027-06-01 → 2030-11-30pendingAnthropic / OpenAI publicly claim 'superintelligence' or AGI achieved
How: Major lab CEO (Altman, Amodei, Hassabis, Musk) publicly declares AGI/superintelligence achieved with technical justification accepted by 50% or more AI safety researchers
Source: Twitter/X, lab blog posts, AI safety community pollsconf 30%
Notes: Cascade for 'smarter than ALL of humanity' (2030-2031 horizon).
2029-03-31pendingScenario fires: AGI mid: Kurzweil 2029 path
2029-08-02pendingAI will become smarter than any single human by end of 2025 or 2026, and smarter than all of humanity combined by 2030 or 2031.

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 48%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-24T02:00:02Z48.2%+1.7pp

Network propagation: 46.5% → 48.2%

4-iter LBP, residual 0.01000 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 806b02f8

LBP2026-05-17T02:00:01Z46.5%+3.4pp

Network propagation: 43.0% → 46.5%

5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96

LBP2026-05-10T02:00:02Z43.0%+6.7pp

Network propagation: 36.4% → 43.0%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z36.4%+11.5pp

Network propagation: 24.8% → 36.4%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

metadata_milestone_miss_sweep2026-05-02T22:07:21Z24.8%-5.2pp

metadata_milestone_miss_sweep bayesian_v2 n=1 inside=0.248 blend=0.248 LLR=-0.261 κ=0.64 no_blend

Raw metadata

{
  "trf": 0.799867745466331,
  "kappa": 0.6429,
  "base_rate": null,
  "predictor": "Elon Musk",
  "total_llr": -0.4054651081081644,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -0.8472978603872036,
  "bayes_factor": "1.3:1 against",
  "blend_reason": "no reference_class linked",
  "inside_prior": 0.3,
  "kappa_source": "predictor_table",
  "n_milestones": 1,
  "blend_applied": false,
  "contributions": [
    {
      "llr": -0.4054651081081644,
      "kind": "quartile_checkpoint",
      "kappa": 0.6429,
      "label": "Q1 window check-in (25%)",
      "weight": 0.05,
      "strength": "weak",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.2606735180027389,
      "expected_date": "2026-02-23",
      "measurement_criterion": null
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "prior_prob",
  "inside_weight": 0.4400925781735683,
  "outside_weight": 0.5599074218264317,
  "posterior_prob": 0.24824927974330024,
  "posterior_logit": -1.1079713783899425,
  "predictor_brier": 0.01,
  "inside_posterior": 0.24824927974330024,
  "blended_posterior": 0.24824927974330024,
  "reference_class_id": null,
  "total_adjusted_llr": -0.2606735180027389,
  "predictor_n_resolved": 2
}

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

No propagation data yet. Run inference/.venv/bin/python scripts/ops/run_loopy_belief_propagation.py on the droplet, or wait for the Sunday 02:00 UTC weekly cron.

Prerequisites (3)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
correlate	S_NO_AI_PAUSE_5Y	No major AI pause through 2031	ai_regulatory_pause	—
correlate	S_AGI_MID_2029	AGI mid: Kurzweil 2029 path	agi_general_capability	—
correlate	S_AI_PAUSE_2026	Major-country AI pause beginning 2026	ai_regulatory_pause	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (4)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.681	manifold	at the end of 2026, will an AI be able to generate a full high-quality tv ep to a prompt?	35%	mentions	pending	2026-05-31
0.644	manifold	Will I get a girlfriend in the year 2026	15%	mentions	pending	2026-04-23
0.633	manifold	Will Claude replace Grok on X in 2026?	9%	mentions	pending	2026-05-07
0.567	manifold	Will The Democrats Win Every Presidential Election from 2028 to 2040?	11%	mentions	pending	2026-05-29

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "smarter than humanity",
  "mode": "FORECAST",
  "role": "Cited-CEO",
  "context": "Musk aggressive-timing thesis. Pairs with INF_071 (Kurzweil 2029 AGI) as upper bound of elite-futurist consensus. Forcing function for trillion-dollar energy buildout.",
  "to_year": 2031,
  "conv_cues": "specific years; CEO FIRST_PERSON; superlative framing",
  "direction": "HAPPEN",
  "from_year": 2025,
  "timeframe": "2025-2031",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "OpenAI o3 surpasses 87.5% on ARC-AGI (above human baseline)",
      "notes": "First clear single-domain superhuman benchmark. Necessary but not sufficient for 'smarter than any single human'.",
      "source": "https://www.veracalloway.com/blog/ai-culture/agi-timeline/ — o3 hit 87.5% Dec 2024",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -9,
      "source_id": null,
      "confidence": 0.99,
      "source_url": "https://www.veracalloway.com/blog/ai-culture/agi-timeline/",
      "expected_date": "2024-12-20",
      "observed_date": "2024-12-20",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI o3 publishes ARC-AGI score 85% or higher (human baseline) on official leaderboard"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "overdue",
      "weight": 0.05,
      "ordinal": -8,
      "source_id": null,
      "expected_date": "2026-02-23",
      "observed_date": null,
      "miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
      "miss_emitted_by": "metadata_milestone_sweep"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-5.5 (Spud) completes pretraining (March 2026)",
      "source": "https://www.veracalloway.com/blog/ai-culture/agi-timeline/",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://www.veracalloway.com/blog/ai-culture/agi-timeline/",
      "expected_date": "2026-03-24",
      "observed_date": "2026-03-24",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI internal communications / leaked reporting confirms GPT-5.5 pretraining completion in March 2026"
    },
    {
      "kind": "llm_pre_event",
      "label": "Frontier model passes adversarial Turing test (high-quality, 2-hour panel)",
      "source": "Manifold market reference at https://aimultiple.com/artificial-general-intelligence-singularity-timing",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.45,
      "source_url": "https://aimultiple.com/artificial-general-intelligence-singularity-timing",
      "expected_date": "2027-03-17",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2027-12-31",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Independent academic study (Stanford/MIT/Oxford) confirms frontier model passes 2-hour adversarial Turing test with expert judges and 50% pass rate or higher"
    },
    {
      "kind": "llm_pre_event",
      "label": "AI matches or exceeds human expert performance across 5+ professional domains simultaneously",
      "source": "Stanford AI Index annual report",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.5,
      "expected_date": "2027-03-17",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2027-12-31",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Independent study (Apollo/METR/AI Index) shows frontier model matches expert humans in 5+ distinct professional benchmarks (medical diagnosis, legal reasoning, programming, mathematical proof, scientific writing)"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "pending",
      "weight"
... (truncated)