238_020predictionAIAI-timing

Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently)

Predictor: Alex Wissner-Gross · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source

Prior probability

72.0%

Current probability

60.7%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

partial

Window

2026-06-01 – 2026-06-30

Edges in / out

8 / 8

Tickers exposed

Prediction text

Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently) | math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.

Verbatim quote

From episode "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238"

math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.

Resolution evidence

Status: partial

GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.844

Brier

0.0341

excellent

Hits / Misses

6 / 1

of 11 resolved

Hit rate

54.5%

Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 60.7%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 5 fired ✓

2026-04-29hitNvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a
2026-04-29hitTraining runs costing $10 billion for a single model will commence sometime in 2025.
2026-04-29hitAnthropic revenue will cross OpenAI revenue in middle of 2026
2026-04-29hit2025 will be the definitive year that agentic systems finally hit the mainstream.
2026-04-29hitRecursive self-improvement is already happening now (no longer three years out)
2026-05-01partialMath field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently)
2026-06-20pendingxAI/Grok will catch up and exceed competitors on coding by mid-2026
2026-07-15 → 2026-08-31pendingAI achieves IMO Gold (top-30 score) on 2026 problems
How: AI lab announces system achieving IMO Gold Medal score (typically 25/42+ on 2026 problems), with results published by ImoGrandChallenge or peer evaluator
Source: DeepMind AlphaProof, OpenAI math results announcementsconf 75%
Notes: DeepMind already achieved Silver in 2024; Gold by 2026 is the natural progression. ImoGrandChallenge.org tracks this.
2026-06-01 → 2026-12-31pendingFrontierMath benchmark passes 50% by frontier model
How: Top model on FrontierMath benchmark crosses 50% accuracy (current best as of late 2025 was ~25-35%)
Source: https://epoch.ai/frontiermath — Epoch AI's FrontierMath leaderboard. Anthropic/OpenAI/DeepMind blog announcements.conf 65%
Notes: FrontierMath is the canonical 'research-level math' benchmark — 200 problems by working mathematicians, designed to be hard.
2026-06-01 → 2027-03-31pendingFirst open math problem solved by AI publicly announced
How: Frontier AI lab announces solution to a previously-open math problem (Erdős, Millennium, or peer-reviewed open conjecture) with verification by mathematicians
Source: Lab blog posts, arXiv preprints with mathematician co-authors, Quanta Magazine coverageconf 45%
Notes: Wissner-Gross referenced 'rumors' of GPT-5.4 solving an open problem in late 2025. If true, this would already be a HIT.
2026-09-01 → 2027-12-31pendingMathematician community publishes paper acknowledging AI as research collaborator
How: Peer-reviewed math paper lists AI system as essential research collaborator (not just tool) with acknowledgment from working mathematicians (e.g., Terence Tao writeup style)
Source: arXiv math papers, Annals of Mathematics, Tao's blog (terrytao.wordpress.com)conf 60%
2027-06-26pendingMath is cooked (will be solved), physics cooked, biology char broiled.
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2028-09-07pendingBy 2028, AI systems will reach 'independent researcher' level — driving autonomous scientific discoveries without human intervention.
2033-07-30pendingRay Kurzweil predicts Longevity Escape Velocity (LEV) by 2033.
2033-08-10pendingASI will arrive within 2 years to 5 years to this next decade

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 61%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-10T02:00:02Z60.7%-1.5pp

Network propagation: 62.2% → 60.7%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z62.2%-2.3pp

Network propagation: 64.5% → 62.2%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

resolution_terminal2026-05-01T00:00:00Z50.0%-14.5pp

resolution_terminal partial outcome=0.5 pre_resolution=0.645

Raw metadata

{
  "source": "backfill_resolution_history.py",
  "status": "partial",
  "bayesian_v2": false,
  "outcome_prob": 0.5,
  "evidence_kind": "resolution_terminal",
  "posterior_prob": 0.5,
  "delta_to_outcome": -0.14456000000000002,
  "inside_posterior": 0.64456,
  "validation_notes": "GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.",
  "validation_status": "hit",
  "pre_resolution_prob": 0.64456,
  "resolution_evidence": "GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.",
  "does_not_update_current_prob": true
}

LBP2026-04-30T16:39:51Z64.5%-3.1pp

Network propagation: 67.5% → 64.5%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z67.5%-4.5pp

Network propagation: 72.0% → 67.5%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	234_012 Anthropic revenue will cross OpenAI revenue in middle of 202 — Peter Diamandis	67.1%	0.720	0.050	-0.111
prereq	SEM_042 2025 will be the definitive year that agentic systems finall — Kevin Weil	73.8%	0.720	0.050	-0.069
prereq	SEM_012 Nvidia quadrupled chip production output while only doubling — Jensen Huang	75.0%	0.720	0.050	-0.059
prereq	SEM_008 Training runs costing $10 billion for a single model will co — Dario Amodei	76.9%	0.720	0.050	-0.047
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.720	+0.046

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	239_004 xAI/Grok will catch up and exceed competitors on coding by m — Elon Musk	40.2%	0.500	0.050	-0.084
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	+0.083
prereq	235_030 Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 203 — Ray Kurzweil	39.2%	0.750	0.050	+0.075
prereq	241_038 Chinese AI strategy is edge computing focused vs US AGI/ASI — Eric Schmidt	43.3%	0.600	0.050	-0.055
prereq	241_043 ASI will arrive within 2 years to 5 years to this next decad — Peter Diamandis	35.9%	0.650	0.050	+0.049

Ticker exposure

33 ticker(s) linked

Beneficiaries (23)

SOUN CRWV SITM NVDA ARM GTLB BBAI TSM APLD CEVA AI MSFT MRVL SFTBY ORCL QCOM AVGO BABA AMD GOOGL IBM AMZN META

Adverse (6)

WNS CHGG CTSH IBM INFY ACN

Prerequisites (8)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	SEM_008	Training runs costing $10 billion for a single model will commence sometime in 2025.	AI	—
prereq	238_009	Recursive self-improvement is already happening now (no longer three years out)	AI	—
prereq	234_012	Anthropic revenue will cross OpenAI revenue in middle of 2026	Markets/Stocks	—
prereq	SEM_012	Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.	AI/Manufacturing	—
prereq	SEM_042	2025 will be the definitive year that agentic systems finally hit the mainstream.	AI/Agents	—
killer	TK14	Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (8)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	235_030	Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 2033.	Biotech/Longevity	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	241_043	ASI will arrive within 2 years to 5 years to this next decade	AI	—
prereq	231_013	Math is cooked (will be solved), physics cooked, biology char broiled.	AI	—
prereq	241_038	Chinese AI strategy is edge computing focused vs US AGI/ASI centered	AI	—
prereq	241_025	Elon Musk predicts launch per hour cadence to populate satellite constellations	Space	—
prereq	CMQ_002	By 2028, AI systems will reach 'independent researcher' level — driving autonomous scientific discoveries without human intervention.	AI	—
prereq	239_004	xAI/Grok will catch up and exceed competitors on coding by mid-2026	AI	—

Expected milestones (1)

From Sheet 17 Monitoring Triggers

Expected by	Description	Status
2026-06-30	[Capability 2026-06] OpenClaw agents by statistical chance. [238_020] Math field is 'cooked' — AI will solve research-level mathematics (first open ha	pending

Validations (1)

Resolution events

Observed at	Status	By	Notes
2026-04-29	hit	thesis_timeline_v1.0_import	GPT-5.4 claimed 38% on Frontier Math Tier 4; DeepMind AlphaProof trajectory to IMO gold 2024-2025. Math not fully 'cooked' but progressing rapidly.

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.676	arxiv	Benchmarks in Leipzig	—	mentions	pending	2026-06-04
0.669	manifold	Will research-level math become a sport akin to chess before 2035?	12%	mentions	pending	2026-05-09
0.623	arxiv	Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning	—	mentions	pending	2026-05-28
0.592	manifold	Will I solve an Erdos problem?	6%	mentions	pending	2026-04-27
0.580	github_release	facebookresearch/hydra v1.0.3	—	mentions	pending	2020-09-23
0.570	manifold	Will a particular friend of mine crack anyone while at math camp?	8%	mentions	pending	2026-04-24
0.565	manifold	What will my mathcounts state score be?	—	mentions	pending	2026-04-26
0.555	polymarket	Boston Red Sox vs. New York Yankees	43%	mentions	pending	2026-05-30
0.554	polymarket	Chicago Cubs vs. Chicago White Sox	55%	mentions	pending	2026-05-09
0.553	polymarket	Boston Red Sox vs. New York Yankees	49%	mentions	pending	2026-05-31

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "38% on Frontier Math Tier 4",
  "url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
  "mode": "PREDICTION",
  "role": "Host",
  "caveats": "rumors",
  "context": "now with GPT 5.4 turned up to maximum reasoning capability, we're seeing finally, and this was a prediction I think in our prediction episode, math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.",
  "to_year": 2026,
  "verbatim": "math is cooked. We're we're seeing I think 38% capability... And there are even rumors even in the past 24 to 48 hours that the next tier up the so-called open problems benchmark that 5.4 is reportedly rumored to be on the verge of solving the first open hard math problem.",
  "conv_cues": "math is cooked; on the verge",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "Imminent",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "prereq",
      "label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Training runs costing $10 billion for a single model will commence sometime in 2025.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -4,
      "source_id": "SEM_008",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Anthropic revenue will cross OpenAI revenue in middle of 2026",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -3,
      "source_id": "234_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "2025 will be the definitive year that agentic systems finally hit the mainstream.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -2,
      "source_id": "SEM_042",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Recursive self-improvement is already happening now (no longer three years out)",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -1,
      "source_id": "238_009",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "event",
      "label": "Math field is 'cooked' — AI will solve research-level mathematics (first open hard math problem imminently)",
      "status": "partial",
      "weight": 1,
      "ordinal": 0,
      "source_id": "238_020",
      "expected_date": "2026-05-01",
      "observed_date": "2026-05-01"
    },
    {
      "kind": "cascade",
      "label": "xAI/Grok will catch up and exceed competitors on coding by mid-2026",
      "status": "pending",
      "weight": 0.5,
      "ordinal": 1,
      "source_id": "239_004",
      "expected_date": "2026-06-20",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI achieves IMO Gold (top-30 score) on 2026 problems",
      "notes": "DeepMind already achieved Silver in 2024; Gold by 2026 is the natural progression. ImoGrandChallenge.org tracks this.",
      "source": "DeepMind AlphaProof, OpenAI math results announcements",
      "status": "pending",
      "weight": 0.4,
      "ordinal": 2,
      "source_id": null,
      "confidence": 0.75,
      "expected_date": "2026-08-07",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2026-08-31",
        "from": "2026-07-15"
      },
      "measurement_criterion": "AI lab announces system achieving IMO Gold Medal score (typically 25/42+ on 2026 problems), w
... (truncated)