← Cockpit
238_020predictionAIAI-timing

Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently)

Predictor: Alex Wissner-Gross · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source

Prior probability
72.0%
Current probability
60.7%
evolves via intake + LBP
Conviction
4/5
Signal quality
B
Resolution
partial
Window
2026-06-01 – 2026-06-30
Edges in / out
8 / 8
Tickers exposed
33

Prediction text

Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently) | math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.

Verbatim quote

From episode "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238"
math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.

Resolution evidence

Status: partial

GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22
κ (discount)
0.844
Brier
0.0341
excellent
Hits / Misses
6 / 1
of 11 resolved
Hit rate
54.5%
Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows
0%25%50%75%100%prior 72%2026-04-302026-05-012026-05-10
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 60.7%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 5 fired ✓
  1. 2026-07-15 → 2026-08-31pendingAI achieves IMO Gold (top-30 score) on 2026 problems
    How: AI lab announces system achieving IMO Gold Medal score (typically 25/42+ on 2026 problems), with results published by ImoGrandChallenge or peer evaluator
    Source: DeepMind AlphaProof, OpenAI math results announcementsconf 75%
    Notes: DeepMind already achieved Silver in 2024; Gold by 2026 is the natural progression. ImoGrandChallenge.org tracks this.
  2. 2026-06-01 → 2026-12-31pendingFrontierMath benchmark passes 50% by frontier model
    How: Top model on FrontierMath benchmark crosses 50% accuracy (current best as of late 2025 was ~25-35%)
    Source: https://epoch.ai/frontiermath — Epoch AI's FrontierMath leaderboard. Anthropic/OpenAI/DeepMind blog announcements.conf 65%
    Notes: FrontierMath is the canonical 'research-level math' benchmark — 200 problems by working mathematicians, designed to be hard.
  3. 2026-06-01 → 2027-03-31pendingFirst open math problem solved by AI publicly announced
    How: Frontier AI lab announces solution to a previously-open math problem (Erdős, Millennium, or peer-reviewed open conjecture) with verification by mathematicians
    Source: Lab blog posts, arXiv preprints with mathematician co-authors, Quanta Magazine coverageconf 45%
    Notes: Wissner-Gross referenced 'rumors' of GPT-5.4 solving an open problem in late 2025. If true, this would already be a HIT.
  4. 2026-09-01 → 2027-12-31pendingMathematician community publishes paper acknowledging AI as research collaborator
    How: Peer-reviewed math paper lists AI system as essential research collaborator (not just tool) with acknowledgment from working mathematicians (e.g., Terence Tao writeup style)
    Source: arXiv math papers, Annals of Mathematics, Tao's blog (terrytao.wordpress.com)conf 60%

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 61%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
LBP2026-05-10T02:00:02Z60.7%-1.5pp
Network propagation: 62.2% → 60.7%
6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29
LBP2026-05-03T02:00:01Z62.2%-2.3pp
Network propagation: 64.5% → 62.2%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
resolution_terminal2026-05-01T00:00:00Z50.0%-14.5pp
resolution_terminal partial outcome=0.5 pre_resolution=0.645
Raw metadata
{
  "source": "backfill_resolution_history.py",
  "status": "partial",
  "bayesian_v2": false,
  "outcome_prob": 0.5,
  "evidence_kind": "resolution_terminal",
  "posterior_prob": 0.5,
  "delta_to_outcome": -0.14456000000000002,
  "inside_posterior": 0.64456,
  "validation_notes": "GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.",
  "validation_status": "hit",
  "pre_resolution_prob": 0.64456,
  "resolution_evidence": "GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.",
  "does_not_update_current_prob": true
}
LBP2026-04-30T16:39:51Z64.5%-3.1pp
Network propagation: 67.5% → 64.5%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
LBP2026-04-30T02:18:57Z67.5%-4.5pp
Network propagation: 72.0% → 67.5%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereq234_012
Anthropic revenue will cross OpenAI revenue in middle of 202Peter Diamandis
67.1%0.7200.050-0.111
prereqSEM_042
2025 will be the definitive year that agentic systems finallKevin Weil
73.8%0.7200.050-0.069
prereqSEM_012
Nvidia quadrupled chip production output while only doublingJensen Huang
75.0%0.7200.050-0.059
prereqSEM_008
Training runs costing $10 billion for a single model will coDario Amodei
76.9%0.7200.050-0.047
killerTK03
AI Regulatory Moratorium (EU/US Capability Freeze)
10.0%0.0500.720+0.046

Top outgoing (children)

Predictions THIS node influences

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereq239_004
xAI/Grok will catch up and exceed competitors on coding by mElon Musk
40.2%0.5000.050-0.084
prereq232_055
We're exiting the industrial age permanently as recursive sePeter Diamandis
35.5%0.7000.050+0.083
prereq235_030
Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 203Ray Kurzweil
39.2%0.7500.050+0.075
prereq241_038
Chinese AI strategy is edge computing focused vs US AGI/ASI Eric Schmidt
43.3%0.6000.050-0.055
prereq241_043
ASI will arrive within 2 years to 5 years to this next decadPeter Diamandis
35.9%0.6500.050+0.049

Ticker exposure

33 ticker(s) linked

Beneficiaries (23)

SOUNCRWVSITMNVDAARMGTLBBBAITSMAPLDCEVAAIMSFTMRVLSFTBYORCLQCOMAVGOBABAAMDGOOGLIBMAMZNMETA

Adverse (6)

WNSCHGGCTSHIBMINFYACN

Prerequisites (8)

Predictions that must hit first
TypePredTitleDomainLag
prereqSEM_008Training runs costing $10 billion for a single model will commence sometime in 2025.AI
prereq238_009Recursive self-improvement is already happening now (no longer three years out)AI
prereq234_012Anthropic revenue will cross OpenAI revenue in middle of 2026Markets/Stocks
prereqSEM_012Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.AI/Manufacturing
prereqSEM_0422025 will be the definitive year that agentic systems finally hit the mainstream.AI/Agents
killerTK14Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates)
killerTK01AGI Capability Plateau (2026-27 Training Stall)
killerTK03AI Regulatory Moratorium (EU/US Capability Freeze)

Dependents (8)

Predictions enabled by this
TypePredTitleDomainLag
prereq235_030Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 2033.Biotech/Longevity
prereq232_055We're exiting the industrial age permanently as recursive self-improvement unfolds.AI
prereq241_043ASI will arrive within 2 years to 5 years to this next decadeAI
prereq231_013Math is cooked (will be solved), physics cooked, biology char broiled.AI
prereq241_038Chinese AI strategy is edge computing focused vs US AGI/ASI centeredAI
prereq241_025Elon Musk predicts launch per hour cadence to populate satellite constellationsSpace
prereqCMQ_002By 2028, AI systems will reach 'independent researcher' level — driving autonomous scientific discoveries without human intervention.AI
prereq239_004xAI/Grok will catch up and exceed competitors on coding by mid-2026AI

Expected milestones (1)

From Sheet 17 Monitoring Triggers
Expected byDescriptionStatus
2026-06-30[Capability 2026-06] OpenClaw agents by statistical chance. [238_020] Math field is 'cooked' — AI will solve research-level mathematics (first open hapending

Validations (1)

Resolution events
Observed atStatusByNotes
2026-04-29hitthesis_timeline_v1.0_importGPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT
SimSourceTitleMarket probPolarityReviewedPublished
0.676arxivBenchmarks in Leipzigmentionspending2026-06-04
0.669manifoldWill research-level math become a sport akin to chess before 2035?12%mentionspending2026-05-09
0.623arxivKnowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoningmentionspending2026-05-28
0.592manifoldWill I solve an Erdos problem?6%mentionspending2026-04-27
0.580github_releasefacebookresearch/hydra v1.0.3mentionspending2020-09-23
0.570manifoldWill a particular friend of mine crack anyone while at math camp?8%mentionspending2026-04-24
0.565manifoldWhat will my mathcounts state score be?mentionspending2026-04-26
0.555polymarketBoston Red Sox vs. New York Yankees43%mentionspending2026-05-30
0.554polymarketChicago Cubs vs. Chicago White Sox55%mentionspending2026-05-09
0.553polymarketBoston Red Sox vs. New York Yankees49%mentionspending2026-05-31

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "qty": "38% on Frontier Math Tier 4",
  "url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
  "mode": "PREDICTION",
  "role": "Host",
  "caveats": "rumors",
  "context": "now with GPT 5.4 turned up to maximum reasoning capability, we're seeing finally, and this was a prediction I think in our prediction episode, math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.",
  "to_year": 2026,
  "verbatim": "math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.",
  "conv_cues": "math is cooked; on the verge",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "Imminent",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "prereq",
      "label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Training runs costing $10 billion for a single model will commence sometime in 2025.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -4,
      "source_id": "SEM_008",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Anthropic revenue will cross OpenAI revenue in middle of 2026",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -3,
      "source_id": "234_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "2025 will be the definitive year that agentic systems finally hit the mainstream.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -2,
      "source_id": "SEM_042",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Recursive self-improvement is already happening now (no longer three years out)",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -1,
      "source_id": "238_009",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "event",
      "label": "Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently)",
      "status": "partial",
      "weight": 1,
      "ordinal": 0,
      "source_id": "238_020",
      "expected_date": "2026-05-01",
      "observed_date": "2026-05-01"
    },
    {
      "kind": "cascade",
      "label": "xAI/Grok will catch up and exceed competitors on coding by mid-2026",
      "status": "pending",
      "weight": 0.5,
      "ordinal": 1,
      "source_id": "239_004",
      "expected_date": "2026-06-20",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI achieves IMO Gold (top-30 score) on 2026 problems",
      "notes": "DeepMind already achieved Silver in 2024; Gold by 2026 is the natural progression. ImoGrandChallenge.org tracks this.",
      "source": "DeepMind AlphaProof, OpenAI math results announcements",
      "status": "pending",
      "weight": 0.4,
      "ordinal": 2,
      "source_id": null,
      "confidence": 0.75,
      "expected_date": "2026-08-07",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-08-31",
        "from": "2026-07-15"
      },
      "measurement_criterion": "AI lab announces system achieving IMO Gold Medal score (typically 25/42+ on 2026 problems), w
... (truncated)