← Cockpit
230_040predictionAIAI-scaling

AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly.

Predictor: Peter Diamandis · ep#230 "AI CEOs Come Online: Sam Altman's Replacement Plan, Job Loss & 'Solve Everything' Launches |EP #230" · source

Prior probability
65.0%
Current probability
50.4%
evolves via intake + LBP
Conviction
4/5
Signal quality
B
Resolution
pending
Window
2026-01-01 – 2027-10-31
Edges in / out
10 / 5
Tickers exposed
37

Prediction text

AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly. | AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.

Verbatim quote

From episode "AI CEOs Come Online: Sam Altman's Replacement Plan, Job Loss & 'Solve Everything' Launches |EP #230"
AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.

Predictor: Peter Diamandis

κ + Brier as of 2026-05-22
κ (discount)
0.875
Brier
0.0367
excellent
Hits / Misses
10 / 0
of 15 resolved
Hit rate
66.7%
Calibration plot (stated vs observed)

Evidence about this node from Peter Diamandis is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class: agi_breakthrough_5y

Linked via embedding similarity 0.573

Major capability discontinuity (e.g. AGI by named target year, 5-year horizon)

Base rate
20.0%
1/5 historical
Inside weight
Outside weight
no pull
inside 50.4% → blend 50.4% 0.0pp)

Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.

Probability over time

6 prob_history rows
0%25%50%75%100%prior 65%2026-04-302026-04-302026-05-10
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 50.4%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 7 fired ✓ · 1 pending
  1. 2026-03-15hitClaude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)
    How: BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queries
    Source: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 92%
  2. 2026-03-15hitReasoning Paradox confirmed — chain-of-thought hurts factuality
    How: BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modes
    Source: https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026conf 85%
    Notes: DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.
  3. 2026-04-15hitGPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)
    How: AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucination
    Source: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 85%
    Notes: Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.
  4. 2026-09-01 → 2027-08-31pendingIndustry-leader hallucination rate drops below 2% on standard factual benchmarks
    How: Top frontier model achieves <=2% hallucination on suprmind / Vectara hallucination benchmark
    Source: https://suprmind.ai/hub/insights/ai-hallucination-statistics-research-report-2026/conf 50%
  5. 2026-06-01 → 2027-12-31pendingSelf-correcting RLHF or constitutional method reduces error rate by 50% vs base
    How: Published research demonstrates self-checking or constitutional AI cuts hallucination >=50% vs base model on held-out factual benchmark
    Source: https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/reduce-hallucinations + Anthropic/OpenAI papersconf 55%

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 50%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
LBP2026-05-10T02:00:02Z50.4%+1.8pp
Network propagation: 48.6% → 50.4%
6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29
LBP2026-05-03T02:00:01Z48.6%+3.8pp
Network propagation: 44.8% → 48.6%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
LBP2026-04-30T16:39:51Z44.8%+8.3pp
Network propagation: 36.5% → 44.8%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
legacy v12026-04-30T16:13:50Z36.5%-9.1pp
reference_class_assigned bayesian_v2 inside=0.650 blend=0.365 w_in=0.41 agi_breakthrough_5y
LBP2026-04-30T02:18:57Z45.6%+9.1pp
Network propagation: 36.5% → 45.6%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef
legacy v12026-04-30T01:56:50Z36.5%-28.5pp
reference_class_assigned bayesian_v2 inside=0.650 blend=0.365 w_in=0.41 agi_breakthrough_5y

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
killerTK03
AI Regulatory Moratorium (EU/US Capability Freeze)
10.0%0.0500.650+0.086
killerTK02
AI Compute Supply Shock (TSMC/Taiwan Disruption)
12.0%0.0500.650+0.074
killerTK09
Energy Grid Cap (Data Center Power Wall)
35.0%0.0500.650-0.064
prereqSEM_014
Nvidia's Arizona-based TSMC factory successfully fabricated Jensen Huang
86.1%0.6500.050+0.057
killerTK01
AGI Capability Plateau (2026-27 Training Stall)
15.0%0.0500.650+0.056

Top outgoing (children)

Predictions THIS node influences

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereq248_033
Superhuman AI will make BCI-enhanced humans irrelevant compaDave Blundin
36.7%0.6000.050-0.036
prereq232_055
We're exiting the industrial age permanently as recursive sePeter Diamandis
35.5%0.7000.050+0.028
prereq244_019
Peter's son won't need a driver's license in 2 yearsPeter Diamandis
48.4%0.9200.050+0.011
prereq230_020
Peter's 14-year-old son Milan will never get a driver's licePeter Diamandis
34.7%0.6500.050+0.010
prereq242_031
Most large companies' business models will be disrupted in 2Peter Diamandis
36.1%0.6500.050-0.004

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MUWULFIRENEQIXALABAPLDASMIYASMLPLABNVDANBISCRWVAAPLAMTAMZNDELLGOOGLIRMLNVGYMETAMSFTORCLSFTBYSTX

Adverse (6)

ACNGENCHGGIBMWNSLRN

Prerequisites (10)

Predictions that must hit first
TypePredTitleDomainLag
prereqSEM_011Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.Capital Markets
prereqSEM_027Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.Capital Markets
prereqSEM_014Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).Manufacturing
prereqSEM_012Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.AI/Manufacturing
prereqSEM_015Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.Policy/Semis
killerTK09Energy Grid Cap (Data Center Power Wall)
killerTK05Rate Regime Persistence (10y > 5% through 2028)
killerTK01AGI Capability Plateau (2026-27 Training Stall)
killerTK02AI Compute Supply Shock (TSMC/Taiwan Disruption)
killerTK03AI Regulatory Moratorium (EU/US Capability Freeze)

Dependents (5)

Predictions enabled by this
TypePredTitleDomainLag
prereq244_019Peter's son won't need a driver's license in 2 yearsAuto/Transport
prereq232_055We're exiting the industrial age permanently as recursive self-improvement unfolds.AI
prereq242_031Most large companies' business models will be disrupted in 2-5 yearsMarkets/Stocks
prereq230_020Peter's 14-year-old son Milan will never get a driver's license.Auto/Transport
prereq248_033Superhuman AI will make BCI-enhanced humans irrelevant compared to AI 2 years from today.AI

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT
SimSourceTitleMarket probPolarityReviewedPublished
0.750manifoldWill AI continue to improve?84%mentionspending2026-06-01
0.683arxivMeasuring AI Reasoning: A Guide for Researchersmentionspending2026-05-04
0.682arxivSelf-Trained Verification for Training- and Test-Time Self-Improvementmentionspending2026-05-28
0.675arxivRecursive Agent Optimizationmentionspending2026-05-07
0.672arxivSilent Collapse in Recursive Learning Systemsmentionspending2026-05-14
0.660arxivLearning, Fast and Slow: Towards LLMs That Adapt Continuallymentionspending2026-05-12
0.654arxivPerturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updatesmentionspending2026-05-04
0.647arxivEvaluation Awareness in Language Models Has Limited Effect on Behaviourmentionspending2026-05-07
0.631arxivImproving Reproducibility in Evaluation through Multi-Level Annotator Modelingmentionspending2026-05-13
0.628arxivBoosting Self-Consistency with Rankingmentionspending2026-06-03

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=6P0uTDGDr-I",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "AI is the slowest and most incorrect it will ever be... we're at the steepest part of the curve and it's going to become more and more capable every day... going to eliminate this very quickly.",
  "to_year": 2027,
  "verbatim": "AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.",
  "conv_cues": "going to eliminate this very quickly; steepest part of the curve",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "very quickly",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Claude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)",
      "source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.92,
      "source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "expected_date": "2026-03-15",
      "observed_date": "2026-03-15",
      "research_origin": "deep_research",
      "measurement_criterion": "BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queries"
    },
    {
      "kind": "llm_pre_event",
      "label": "Reasoning Paradox confirmed — chain-of-thought hurts factuality",
      "notes": "DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.",
      "source": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
      "expected_date": "2026-03-15",
      "observed_date": "2026-03-15",
      "research_origin": "deep_research",
      "measurement_criterion": "BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modes"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)",
      "notes": "Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.",
      "source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "research_origin": "deep_research",
      "measurement_criterion": "AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucination"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market ca
... (truncated)