230_040predictionAIAI-scaling

AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly.

Predictor: Peter Diamandis · ep#230 "AI CEOs Come Online: Sam Altman's Replacement Plan, Job Loss & 'Solve Everything' Launches |EP #230" · source

Prior probability

65.0%

Current probability

50.4%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-01-01 – 2027-10-31

Edges in / out

10 / 5

Tickers exposed

Prediction text

AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly. | AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.

Verbatim quote

From episode "AI CEOs Come Online: Sam Altman's Replacement Plan, Job Loss & 'Solve Everything' Launches |EP #230"

AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.

Predictor: Peter Diamandis

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.875

Brier

0.0367

excellent

Hits / Misses

10 / 0

of 15 resolved

Hit rate

66.7%

Calibration plot (stated vs observed)

Evidence about this node from Peter Diamandis is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class: agi_breakthrough_5y

Linked via embedding similarity 0.573

All classes →

Major capability discontinuity (e.g. AGI by named target year, 5-year horizon)

Base rate

20.0%

1/5 historical

Inside weight

—

Outside weight

—

no pull

inside 50.4% → blend 50.4% (Δ 0.0pp)

Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.

Probability over time

6 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 50.4%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 7 fired ✓ · 1 pending

2026-03-15hitClaude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)
How: BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queries
Source: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 92%
2026-03-15hitReasoning Paradox confirmed — chain-of-thought hurts factuality
How: BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modes
Source: https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026conf 85%
Notes: DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.
2026-04-15hitGPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)
How: AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucination
Source: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 85%
Notes: Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.
2026-04-29hitNvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.
2026-04-29hitNvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.
2026-04-29hitNvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).
2026-04-29hitNvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a
2026-06-25pendingNvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.
2027-02-23pendingAI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly.
2026-09-01 → 2027-08-31pendingIndustry-leader hallucination rate drops below 2% on standard factual benchmarks
How: Top frontier model achieves <=2% hallucination on suprmind / Vectara hallucination benchmark
Source: https://suprmind.ai/hub/insights/ai-hallucination-statistics-research-report-2026/conf 50%
2026-06-01 → 2027-12-31pendingSelf-correcting RLHF or constitutional method reduces error rate by 50% vs base
How: Published research demonstrates self-checking or constitutional AI cuts hallucination >=50% vs base model on held-out factual benchmark
Source: https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/reduce-hallucinations + Anthropic/OpenAI papersconf 55%
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2028-07-20pendingSuperhuman AI will make BCI-enhanced humans irrelevant compared to AI 2 years from today.
2030-09-27pendingMost large companies' business models will be disrupted in 2-5 years
2063-06-21pendingPeter's 14-year-old son Milan will never get a driver's license.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 50%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-10T02:00:02Z50.4%+1.8pp

Network propagation: 48.6% → 50.4%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z48.6%+3.8pp

Network propagation: 44.8% → 48.6%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z44.8%+8.3pp

Network propagation: 36.5% → 44.8%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

legacy v12026-04-30T16:13:50Z36.5%-9.1pp

reference_class_assigned bayesian_v2 inside=0.650 blend=0.365 w_in=0.41 agi_breakthrough_5y

LBP2026-04-30T02:18:57Z45.6%+9.1pp

Network propagation: 36.5% → 45.6%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

legacy v12026-04-30T01:56:50Z36.5%-28.5pp

reference_class_assigned bayesian_v2 inside=0.650 blend=0.365 w_in=0.41 agi_breakthrough_5y

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.650	+0.086
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.650	+0.074
killer	TK09 Energy Grid Cap (Data Center Power Wall)	35.0%	0.050	0.650	-0.064
prereq	SEM_014 Nvidia's Arizona-based TSMC factory successfully fabricated — Jensen Huang	86.1%	0.650	0.050	+0.057
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.650	+0.056

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	248_033 Superhuman AI will make BCI-enhanced humans irrelevant compa — Dave Blundin	36.7%	0.600	0.050	-0.036
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	+0.028
prereq	244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis	48.4%	0.920	0.050	+0.011
prereq	230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis	34.7%	0.650	0.050	+0.010
prereq	242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis	36.1%	0.650	0.050	-0.004

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (10)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	SEM_011	Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.	Capital Markets	—
prereq	SEM_027	Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.	Capital Markets	—
prereq	SEM_014	Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).	Manufacturing	—
prereq	SEM_012	Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.	AI/Manufacturing	—
prereq	SEM_015	Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.	Policy/Semis	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (5)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	244_019	Peter's son won't need a driver's license in 2 years	Auto/Transport	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	242_031	Most large companies' business models will be disrupted in 2-5 years	Markets/Stocks	—
prereq	230_020	Peter's 14-year-old son Milan will never get a driver's license.	Auto/Transport	—
prereq	248_033	Superhuman AI will make BCI-enhanced humans irrelevant compared to AI 2 years from today.	AI	—

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.750	manifold	Will AI continue to improve?	84%	mentions	pending	2026-06-01
0.683	arxiv	Measuring AI Reasoning: A Guide for Researchers	—	mentions	pending	2026-05-04
0.682	arxiv	Self-Trained Verification for Training- and Test-Time Self-Improvement	—	mentions	pending	2026-05-28
0.675	arxiv	Recursive Agent Optimization	—	mentions	pending	2026-05-07
0.672	arxiv	Silent Collapse in Recursive Learning Systems	—	mentions	pending	2026-05-14
0.660	arxiv	Learning, Fast and Slow: Towards LLMs That Adapt Continually	—	mentions	pending	2026-05-12
0.654	arxiv	Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates	—	mentions	pending	2026-05-04
0.647	arxiv	Evaluation Awareness in Language Models Has Limited Effect on Behaviour	—	mentions	pending	2026-05-07
0.631	arxiv	Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling	—	mentions	pending	2026-05-13
0.628	arxiv	Boosting Self-Consistency with Ranking	—	mentions	pending	2026-06-03

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=6P0uTDGDr-I",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "AI is the slowest and most incorrect it will ever be... we're at the steepest part of the curve and it's going to become more and more capable every day... going to eliminate this very quickly.",
  "to_year": 2027,
  "verbatim": "AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.",
  "conv_cues": "going to eliminate this very quickly; steepest part of the curve",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "very quickly",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Claude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)",
      "source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.92,
      "source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "expected_date": "2026-03-15",
      "observed_date": "2026-03-15",
      "research_origin": "deep_research",
      "measurement_criterion": "BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queries"
    },
    {
      "kind": "llm_pre_event",
      "label": "Reasoning Paradox confirmed — chain-of-thought hurts factuality",
      "notes": "DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.",
      "source": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
      "expected_date": "2026-03-15",
      "observed_date": "2026-03-15",
      "research_origin": "deep_research",
      "measurement_criterion": "BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modes"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)",
      "notes": "Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.",
      "source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "research_origin": "deep_research",
      "measurement_criterion": "AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucination"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market ca
... (truncated)