231_014predictionAIAI-scaling

Remaining research math problems will be solved within next couple months.

Predictor: Dave Blundin · ep#231 "Top AI News: Sonnet 4.6, Grok 4.2, Gemini 3 Deep Think, and OpenClaw | EP #231" · source

Prior probability

60.0%

Current probability

49.5%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-01-01 – 2026-11-30

Edges in / out

10 / 5

Tickers exposed

Prediction text

Remaining research math problems will be solved within next couple months. | If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.

Verbatim quote

From episode "Top AI News: Sonnet 4.6, Grok 4.2, Gemini 3 Deep Think, and OpenClaw | EP #231"

If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.

Predictor: Dave Blundin

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.821

Brier

0.0491

excellent

Hits / Misses

3 / 2

of 9 resolved

Hit rate

33.3%

Calibration plot (stated vs observed)

Evidence about this node from Dave Blundin is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

4 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 49.5%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 7 fired ✓ · 2 pending

2026-04-15hitFrontier models score >=99% on AIME 2025/2026 competition math
How: Top frontier models (GPT-5.4, Claude Opus 4.6, Gemini 3 Flash) all score >=95% on AIME 2025 / 2026
Source: https://benchlm.ai/mathconf 99%
Notes: HIT for COMPETITION math; prediction targets RESEARCH math which is harder.
2026-04-15hitFrontierMath Tier 1-3 solve rate >40% by GPT-5.2/Claude Opus 4.6
How: Public benchmark confirms top frontier models solve >=40% of FrontierMath Tier 1-3 problems
Source: https://epoch.ai/frontiermathconf 90%
Notes: Mid-progress: prediction targets ALL 10 of a specific problem set; FrontierMath is broader. Partial validation only.
2026-04-15hitAletheia (Gemini Deep Think) achieves publishable PhD-level result in arithmetic geometry
How: Google DeepMind publicly announces Aletheia produces publishable research-grade result in mathematics
Source: https://spectrum.ieee.org/ai-math-benchmarksconf 85%
2026-04-29hitNvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.
2026-04-29hitNvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.
2026-04-29hitNvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).
2026-04-29hitNvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a
2026-06-25pendingNvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.
2026-05-01 → 2026-09-30pendingFirst Proof Challenge: AI solves >=1 of 10 expert-curated math problems
How: Frontier model produces verified proof for >=1 of the 11-mathematician First Proof Challenge problems
Source: https://spectrum.ieee.org/ai-math-benchmarksconf 50%
Notes: First Proof Challenge proposed Feb 2026 by 11 distinguished mathematicians.
2026-07-18pendingRemaining research math problems will be solved within next couple months.
2026-06-01 → 2026-11-30pendingAll 10 of Blundin-referenced research math problems solved by AI
How: Public reporting confirms a frontier AI solves all 10 of the specific 'remaining research math problems' Blundin referenced (originally said 6/10 already)
Source: Lab announcements, FrontierMath reportingconf 30%
Notes: Cascade — exact resolution of prediction. Specific problem set not publicly named, so this is hard to verify without anchor.
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2030-09-27pendingMost large companies' business models will be disrupted in 2-5 years
2063-06-21pendingPeter's 14-year-old son Milan will never get a driver's license.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 50%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-10T02:00:02Z49.5%-1.2pp

Network propagation: 50.7% → 49.5%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z50.7%-2.2pp

Network propagation: 52.9% → 50.7%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z52.9%-2.9pp

Network propagation: 55.8% → 52.9%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z55.8%-4.2pp

Network propagation: 60.0% → 55.8%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK09 Energy Grid Cap (Data Center Power Wall)	35.0%	0.050	0.600	-0.088
prereq	SEM_015 Nvidia agreed to remit 15% of China chip-sale revenue direct — Jensen Huang	66.3%	0.600	0.050	-0.076
prereq	SEM_027 Nvidia Data Center revenue +66% YoY, contributing ~90% of $5 — Joseph Moore	68.3%	0.600	0.050	-0.075
killer	TK05 Rate Regime Persistence (10y > 5% through 2028)	30.0%	0.050	0.600	-0.060
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.600	+0.050

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	247_023 AI will be able to do everything a white collar worker does — Dave Blundin	40.8%	0.720	0.050	-0.031
prereq	242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis	36.1%	0.650	0.050	-0.018
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	+0.013
prereq	244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis	48.4%	0.920	0.050	-0.010
prereq	230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis	34.7%	0.650	0.050	-0.004

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (10)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	SEM_011	Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.	Capital Markets	—
prereq	SEM_027	Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.	Capital Markets	—
prereq	SEM_014	Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).	Manufacturing	—
prereq	SEM_012	Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.	AI/Manufacturing	—
prereq	SEM_015	Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.	Policy/Semis	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (5)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	244_019	Peter's son won't need a driver's license in 2 years	Auto/Transport	—
prereq	247_023	AI will be able to do everything a white collar worker does imminently	AI	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	242_031	Most large companies' business models will be disrupted in 2-5 years	Markets/Stocks	—
prereq	230_020	Peter's 14-year-old son Milan will never get a driver's license.	Auto/Transport	—

Linked documents (5)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.638	manifold	Will any ARML 2026 problem in the individual round have exactly 67 or 41 solves, if one has 121 solves resolves 67%.	13%	mentions	pending	2026-05-25
0.630	manifold	Will I solve an Erdos problem?	6%	mentions	pending	2026-04-27
0.577	manifold	What will my mathcounts state score be?	—	mentions	pending	2026-04-26
0.568	polymarket	Sabres vs. Bruins	53%	mentions	pending	2026-04-29
0.560	manifold	Will I present my APUSH project before the end of the school year?	73%	mentions	pending	2026-05-09

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "all 10",
  "url": "https://www.youtube.com/watch?v=HklyjXKYFng",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "here it'll happen instantaneously. If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel. There's no limit to the to the number of parallel agents up to the to the number of GPUs that are available.",
  "to_year": 2026,
  "verbatim": "If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.",
  "conv_cues": "it can; it'll happen",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "next couple months",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Frontier models score >=99% on AIME 2025/2026 competition math",
      "notes": "HIT for COMPETITION math; prediction targets RESEARCH math which is harder.",
      "source": "https://benchlm.ai/math",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -9,
      "source_id": null,
      "confidence": 0.99,
      "source_url": "https://benchlm.ai/math",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "research_origin": "deep_research",
      "measurement_criterion": "Top frontier models (GPT-5.4, Claude Opus 4.6, Gemini 3 Flash) all score >=95% on AIME 2025 / 2026"
    },
    {
      "kind": "llm_pre_event",
      "label": "FrontierMath Tier 1-3 solve rate >40% by GPT-5.2/Claude Opus 4.6",
      "notes": "Mid-progress: prediction targets ALL 10 of a specific problem set; FrontierMath is broader. Partial validation only.",
      "source": "https://epoch.ai/frontiermath",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.9,
      "source_url": "https://epoch.ai/frontiermath",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "research_origin": "deep_research",
      "measurement_criterion": "Public benchmark confirms top frontier models solve >=40% of FrontierMath Tier 1-3 problems"
    },
    {
      "kind": "llm_pre_event",
      "label": "Aletheia (Gemini Deep Think) achieves publishable PhD-level result in arithmetic geometry",
      "source": "https://spectrum.ieee.org/ai-math-benchmarks",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://spectrum.ieee.org/ai-math-benchmarks",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "research_origin": "deep_research",
      "measurement_criterion": "Google DeepMind publicly announces Aletheia produces publishable research-grade result in mathematics"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -6,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_027",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -4,
      "source_id": "SEM_014",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Clau
... (truncated)