AUT_002predictionAIunified-mathematical-substrate

Models excelling at highly structured mathematical benchmarks exhibit a 'unified capability substrate' enabling dominance in seemingly unrelated fields (coding, logical reasoning, scientific discovery) — the 'mathematical skeleton' of the technological...

Predictor: Alex Wissner-Gross

Prior probability

72.0%

Current probability

59.1%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

in_progress

Window

2026-01-01 – 2029-11-30

Edges in / out

6 / 0

Tickers exposed

Prediction text

Models excelling at highly structured mathematical benchmarks exhibit a 'unified capability substrate' enabling dominance in seemingly unrelated fields (coding, logical reasoning, scientific discovery) — the 'mathematical skeleton' of the technological singularity. Autonomous agents will seamlessly interface with environmental + biological sensors to continuously monitor, model, and manipulate physical reality via this cross-domain transfer. | Next frontier-model cross-domain benchmark release

Key catalyst: Next frontier-model cross-domain benchmark release

Watch events: Cross-domain-transfer benchmarks (BIG-Bench, GPQA Diamond); physical-world-model scaling

Resolution evidence

Status: in_progress

GPT-5 / Claude 4 / Gemini 3 cross-domain benchmarks (math + coding + science) empirically validate transfer. AlphaFold → AlphaProof → AlphaGeometry demonstrate unified-substrate.

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.844

Brier

0.0341

excellent

Hits / Misses

6 / 1

of 11 resolved

Hit rate

54.5%

Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

4 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 59.1%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 1 fired ✓ · 8 pending

2026-04-24hitFrontier model exceeds 50% on FrontierMath (math benchmark)
How: Any frontier LLM crosses 50% accuracy on Epoch AI's FrontierMath benchmark
Source: https://epoch.ai/benchmarks/frontiermath-tier-4 — GPT-5.5 Pro 52.4%, GPT-5.5 51.7% as of April 2026conf 99%
2026-09-15pendingQ1 window check-in (25%)
2026-04-01 → 2027-06-30pendingSame model leads both math + SWE-bench coding leaderboards simultaneously
How: A single model variant simultaneously occupies top-3 on both FrontierMath and SWE-bench Verified leaderboards
Source: Epoch AI, SWE-bench, Scale Labs leaderboardsconf 80%
Notes: Direct evidence of 'unified capability substrate' — math leadership transfers to coding.
2027-05-30pendingQ2 window check-in (50%)
2026-06-01 → 2028-12-31pendingAI model produces peer-reviewed scientific discovery in non-CS field
How: Peer-reviewed paper attributes a novel discovery (in chemistry, biology, physics, or math) primarily to a frontier LLM/agent system
Source: Nature, Science, peer-reviewed journals tracking AI co-authorshipconf 65%
Notes: Tests cross-domain transfer to scientific discovery — second pillar of the claim.
2027-09-30pendingScenario fires: AGI fast: drop-in remote worker by 2027-09
2028-02-11pendingQ3 window check-in (75%)
2027-01-01 → 2029-10-31pendingAI agent integrates with biological/environmental sensor stack in published study
How: Published research demonstrates AI agent autonomously interfacing with biological or environmental sensor network to monitor and act on physical reality
Source: arxiv, IEEE proceedings, robotics journalsconf 55%
Notes: 'Manipulate physical reality via cross-domain transfer' element of the claim.
2027-06-01 → 2029-11-30pendingComposite cross-domain leaderboard launched (math+code+science+reasoning)
How: Major eval org (Epoch, METR, Stanford HAI) publishes composite cross-domain benchmark with at least one model scoring ≥80%
Source: Stanford AI Index 2027/2028, Epoch AIconf 45%
Notes: Cascade — formal recognition of 'unified capability substrate' as measurable thing.
2028-10-26pendingModels excelling at highly structured mathematical benchmarks exhibit a 'unified capability substrate' enabling dominance in seemingly unrel

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 59%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-10T02:00:02Z59.1%-1.6pp

Network propagation: 60.7% → 59.1%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z60.7%-3.0pp

Network propagation: 63.7% → 60.7%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z63.7%-5.5pp

Network propagation: 69.2% → 63.7%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z69.2%-2.8pp

Network propagation: 72.0% → 69.2%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK06 China-Taiwan Military Conflict	8.0%	0.050	0.720	+0.076
killer	TK11 Autonomous Regulatory Block (Level 4 Halt)	10.0%	0.050	0.720	+0.062

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

10 ticker(s) linked

Beneficiaries (6)

FROG GTLB BABA TEAM GOOGL MSFT

Adverse (4)

UBER PGR TRV ALL

Prerequisites (6)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
correlate	S_AGI_MID_2029	AGI mid: Kurzweil 2029 path	agi_general_capability	—
correlate	S_AGI_FAST_2027	AGI fast: drop-in remote worker by 2027-09	agi_general_capability	—
correlate	S_AGI_SLOW_2031	AGI slow: Schmidt/Hassabis 5-10 year path	agi_general_capability	—
correlate	S_AGI_WINTER_2036PLUS	AGI delayed: capability plateau or AI winter	agi_general_capability	—
killer	TK11	Autonomous Regulatory Block (Level 4 Halt)	—	—
killer	TK06	China-Taiwan Military Conflict	—	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Validations (1)

Resolution events

Observed at	Status	By	Notes
2026-04-29	partial	thesis_timeline_v1.0_import	GPT-5 / Claude 4 / Gemini 3 cross-domain benchmarks (math + coding + science) empirically validate transfer. AlphaFold → AlphaProof → AlphaGeometry demonstrate unified-substrate.

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.702	arxiv	COMPOSE: Composing Future Theorems from Citations and Formal Structure	—	mentions	pending	2026-05-28
0.693	arxiv	Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence	—	mentions	pending	2026-05-31
0.691	arxiv	AI Co-Mathematician: Accelerating Mathematicians with Agentic AI	—	mentions	pending	2026-05-07
0.689	arxiv	ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure	—	mentions	pending	2026-05-28
0.686	arxiv	Benchmarks in Leipzig	—	mentions	pending	2026-06-04
0.685	arxiv	OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields	—	mentions	pending	2026-05-28
0.680	arxiv	Formalizing Mathematics at Scale	—	mentions	pending	2026-05-28
0.679	arxiv	Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts	—	mentions	pending	2026-05-28
0.676	arxiv	Text2CAD-Bench: A Benchmark for LLM-based Text-to-Parametric CAD Generation	—	mentions	pending	2026-05-18
0.676	arxiv	A Framework for Graph-Conditioned Hierarchical Shapley Attribution in Patent Valuation	—	mentions	pending	2026-06-01

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "mode": "FORECAST",
  "role": "Cited-Other",
  "context": "Extends SEM_032 (Wissner-Gross Clay Millennium) and 248_002 (LEO-to-phone). Specific cross-domain-capability-substrate framing.",
  "to_year": 2029,
  "conv_cues": "coined framing; singularity-mathematical-skeleton",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "2026-2029",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Frontier model exceeds 50% on FrontierMath (math benchmark)",
      "source": "https://epoch.ai/benchmarks/frontiermath-tier-4 — GPT-5.5 Pro 52.4%, GPT-5.5 51.7% as of April 2026",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -9,
      "source_id": null,
      "confidence": 0.99,
      "source_url": "https://epoch.ai/benchmarks/frontiermath-tier-4",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-24",
      "research_origin": "deep_research",
      "measurement_criterion": "Any frontier LLM crosses 50% accuracy on Epoch AI's FrontierMath benchmark"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -8,
      "source_id": null,
      "expected_date": "2026-09-15",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "Same model leads both math + SWE-bench coding leaderboards simultaneously",
      "notes": "Direct evidence of 'unified capability substrate' — math leadership transfers to coding.",
      "source": "Epoch AI, SWE-bench, Scale Labs leaderboards",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.8,
      "source_url": "https://labs.scale.com/leaderboard",
      "expected_date": "2026-11-14",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2027-06-30",
        "from": "2026-04-01"
      },
      "measurement_criterion": "A single model variant simultaneously occupies top-3 on both FrontierMath and SWE-bench Verified leaderboards"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -6,
      "source_id": null,
      "expected_date": "2027-05-30",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI model produces peer-reviewed scientific discovery in non-CS field",
      "notes": "Tests cross-domain transfer to scientific discovery — second pillar of the claim.",
      "source": "Nature, Science, peer-reviewed journals tracking AI co-authorship",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.65,
      "expected_date": "2027-09-16",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-12-31",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Peer-reviewed paper attributes a novel discovery (in chemistry, biology, physics, or math) primarily to a frontier LLM/agent system"
    },
    {
      "kind": "scenario_signal",
      "label": "Scenario fires: AGI fast: drop-in remote worker by 2027-09",
      "status": "pending",
      "weight": 0.3,
      "ordinal": -4,
      "source_id": "S_AGI_FAST_2027",
      "expected_date": "2027-09-30",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2028-02-11",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI agent integrates with biological/environmental sensor stack in published study",
      "notes": "'Manipulate physical reality via cross-domain transfer' element of the claim.",
      "source": "arxiv, IEEE proceedings, robotics journals",
      "statu
... (truncated)