AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly.
Predictor: Peter Diamandis · ep#230 "AI CEOs Come Online: Sam Altman's Replacement Plan, Job Loss & 'Solve Everything' Launches |EP #230" · source
Prediction text
AI capability/accuracy will improve recursively; output-checking issues will be eliminated quickly. | AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.
Verbatim quote
AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.
Predictor: Peter Diamandis
Calibration plot (stated vs observed)
Evidence about this node from Peter Diamandis is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class: agi_breakthrough_5y
Major capability discontinuity (e.g. AGI by named target year, 5-year horizon)
Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.
Probability over time
Milestone chain
- 2026-03-15hitClaude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)How: BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queriesSource: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 92%
- 2026-03-15hitReasoning Paradox confirmed — chain-of-thought hurts factualityHow: BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modesSource: https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026conf 85%Notes: DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.
- 2026-04-15hitGPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)How: AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucinationSource: https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0cconf 85%Notes: Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.
- 2026-09-01 → 2027-08-31pendingIndustry-leader hallucination rate drops below 2% on standard factual benchmarksHow: Top frontier model achieves <=2% hallucination on suprmind / Vectara hallucination benchmarkSource: https://suprmind.ai/hub/insights/ai-hallucination-statistics-research-report-2026/conf 50%
- 2026-06-01 → 2027-12-31pendingSelf-correcting RLHF or constitutional method reduces error rate by 50% vs baseHow: Published research demonstrates self-checking or constitutional AI cuts hallucination >=50% vs base model on held-out factual benchmarkSource: https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/reduce-hallucinations + Anthropic/OpenAI papersconf 55%
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| killer | TK03 AI Regulatory Moratorium (EU/US Capability Freeze) | 10.0% | 0.050 | 0.650 | +0.086 |
| killer | TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption) | 12.0% | 0.050 | 0.650 | +0.074 |
| killer | TK09 Energy Grid Cap (Data Center Power Wall) | 35.0% | 0.050 | 0.650 | -0.064 |
| prereq | SEM_014 Nvidia's Arizona-based TSMC factory successfully fabricated — Jensen Huang | 86.1% | 0.650 | 0.050 | +0.057 |
| killer | TK01 AGI Capability Plateau (2026-27 Training Stall) | 15.0% | 0.050 | 0.650 | +0.056 |
Top outgoing (children)
Predictions THIS node influences
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 248_033 Superhuman AI will make BCI-enhanced humans irrelevant compa — Dave Blundin | 36.7% | 0.600 | 0.050 | -0.036 |
| prereq | 232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis | 35.5% | 0.700 | 0.050 | +0.028 |
| prereq | 244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis | 48.4% | 0.920 | 0.050 | +0.011 |
| prereq | 230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis | 34.7% | 0.650 | 0.050 | +0.010 |
| prereq | 242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis | 36.1% | 0.650 | 0.050 | -0.004 |
Ticker exposure
Beneficiaries (24)
Adverse (6)
Prerequisites (10)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | SEM_011 | Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips. | Capital Markets | — |
| prereq | SEM_027 | Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon. | Capital Markets | — |
| prereq | SEM_014 | Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025). | Manufacturing | — |
| prereq | SEM_012 | Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering. | AI/Manufacturing | — |
| prereq | SEM_015 | Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans. | Policy/Semis | — |
| killer | TK09 | Energy Grid Cap (Data Center Power Wall) | — | — |
| killer | TK05 | Rate Regime Persistence (10y > 5% through 2028) | — | — |
| killer | TK01 | AGI Capability Plateau (2026-27 Training Stall) | — | — |
| killer | TK02 | AI Compute Supply Shock (TSMC/Taiwan Disruption) | — | — |
| killer | TK03 | AI Regulatory Moratorium (EU/US Capability Freeze) | — | — |
Dependents (5)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | 244_019 | Peter's son won't need a driver's license in 2 years | Auto/Transport | — |
| prereq | 232_055 | We're exiting the industrial age permanently as recursive self-improvement unfolds. | AI | — |
| prereq | 242_031 | Most large companies' business models will be disrupted in 2-5 years | Markets/Stocks | — |
| prereq | 230_020 | Peter's 14-year-old son Milan will never get a driver's license. | Auto/Transport | — |
| prereq | 248_033 | Superhuman AI will make BCI-enhanced humans irrelevant compared to AI 2 years from today. | AI | — |
Linked documents (10)
Raw metadata
{
"nia": false,
"url": "https://www.youtube.com/watch?v=6P0uTDGDr-I",
"mode": "PREDICTION",
"role": "Host",
"context": "AI is the slowest and most incorrect it will ever be... we're at the steepest part of the curve and it's going to become more and more capable every day... going to eliminate this very quickly.",
"to_year": 2027,
"verbatim": "AI is the slowest and most incorrect it will ever be. I know when I'm using my Claudebot or Claude 4.6 if I get something that seems off I will ask it to check itself. and being able to use this in a recursive fashion... we're in a period of recursive self-improvement. I think we're at the steepest part of the curve and it's going to become more and more capable every day. And the idea that we can use, um, AIS to check AIs and in fact uh, to do uh, deeper reasoning is going to eliminate this very quickly.",
"conv_cues": "going to eliminate this very quickly; steepest part of the curve",
"direction": "HAPPEN",
"from_year": 2026,
"timeframe": "very quickly",
"conv_level": "HIGH",
"milestones": [
{
"kind": "llm_pre_event",
"label": "Claude 4.6 Sonnet achieves ~4% hallucination rate (lowest in market)",
"source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
"status": "hit",
"weight": 0.4,
"ordinal": -8,
"source_id": null,
"confidence": 0.92,
"source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
"expected_date": "2026-03-15",
"observed_date": "2026-03-15",
"research_origin": "deep_research",
"measurement_criterion": "BullshitBench v2 / LLM Hallucination Index 2026 confirms Claude 4.6 ~4% hallucination on 500 factual queries"
},
{
"kind": "llm_pre_event",
"label": "Reasoning Paradox confirmed — chain-of-thought hurts factuality",
"notes": "DIRECTIONAL DISCONFIRMATION of the prediction's claim that recursive self-checking 'eliminates' errors quickly. Empirical evidence shows opposite for several frontier models.",
"source": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
"status": "hit",
"weight": 0.4,
"ordinal": -7,
"source_id": null,
"confidence": 0.85,
"source_url": "https://nevirax.com/en/news/chatgpt-vs-claude-alucinaciones-benchmarks-2026",
"expected_date": "2026-03-15",
"observed_date": "2026-03-15",
"research_origin": "deep_research",
"measurement_criterion": "BullshitBench v2 demonstrates GPT-5.2/Gemini 3 Pro reasoning modes have LOWER factual accuracy than non-reasoning modes"
},
{
"kind": "llm_pre_event",
"label": "GPT-5.5 ships with 86% hallucination rate (most-capable model worst-calibrated)",
"notes": "Strong counter-evidence to 'eliminate quickly' framing. Capability gains have NOT eliminated hallucination.",
"source": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
"status": "hit",
"weight": 0.4,
"ordinal": -6,
"source_id": null,
"confidence": 0.85,
"source_url": "https://medium.com/@anyapi.ai/llm-hallucination-index-2026-why-claude-4-6-7b2d13ed9f0c",
"expected_date": "2026-04-15",
"observed_date": "2026-04-15",
"research_origin": "deep_research",
"measurement_criterion": "AA-Omniscience benchmark records GPT-5.5 at 57% accuracy / 86% hallucination"
},
{
"kind": "prereq",
"label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
"status": "hit",
"weight": 0.5,
"ordinal": -5,
"source_id": "SEM_011",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "prereq",
"label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market ca
... (truncated)