← Cockpit
AUT_002predictionAIunified-mathematical-substrate

Models excelling at highly structured mathematical benchmarks exhibit a 'unified capability substrate' enabling dominance in seemingly unrelated fields (coding, logical reasoning, scientific discovery) — the 'mathematical skeleton' of the technological...

Predictor: Alex Wissner-Gross

Prior probability
72.0%
Current probability
59.1%
evolves via intake + LBP
Conviction
4/5
Signal quality
C
Resolution
in_progress
Window
2026-01-01 – 2029-11-30
Edges in / out
6 / 0
Tickers exposed
10

Prediction text

Models excelling at highly structured mathematical benchmarks exhibit a 'unified capability substrate' enabling dominance in seemingly unrelated fields (coding, logical reasoning, scientific discovery) — the 'mathematical skeleton' of the technological singularity. Autonomous agents will seamlessly interface with environmental + biological sensors to continuously monitor, model, and manipulate physical reality via this cross-domain transfer. | Next frontier-model cross-domain benchmark release

Key catalyst: Next frontier-model cross-domain benchmark release

Watch events: Cross-domain-transfer benchmarks (BIG-Bench, GPQA Diamond); physical-world-model scaling

Resolution evidence

Status: in_progress

GPT-5 / Claude 4 / Gemini 3 cross-domain benchmarks (math + coding + science) empirically validate transfer. AlphaFold → AlphaProof → AlphaGeometry demonstrate unified-substrate.

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22
κ (discount)
0.844
Brier
0.0341
excellent
Hits / Misses
6 / 1
of 11 resolved
Hit rate
54.5%
Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

4 prob_history rows
0%25%50%75%100%prior 72%2026-04-302026-05-032026-05-10
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 59.1%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 1 fired ✓ · 8 pending
  1. 2026-04-24hitFrontier model exceeds 50% on FrontierMath (math benchmark)
    How: Any frontier LLM crosses 50% accuracy on Epoch AI's FrontierMath benchmark
    Source: https://epoch.ai/benchmarks/frontiermath-tier-4 — GPT-5.5 Pro 52.4%, GPT-5.5 51.7% as of April 2026conf 99%
  2. 2026-09-15pendingQ1 window check-in (25%)
  3. 2026-04-01 → 2027-06-30pendingSame model leads both math + SWE-bench coding leaderboards simultaneously
    How: A single model variant simultaneously occupies top-3 on both FrontierMath and SWE-bench Verified leaderboards
    Source: Epoch AI, SWE-bench, Scale Labs leaderboardsconf 80%
    Notes: Direct evidence of 'unified capability substrate' — math leadership transfers to coding.
  4. 2027-05-30pendingQ2 window check-in (50%)
  5. 2026-06-01 → 2028-12-31pendingAI model produces peer-reviewed scientific discovery in non-CS field
    How: Peer-reviewed paper attributes a novel discovery (in chemistry, biology, physics, or math) primarily to a frontier LLM/agent system
    Source: Nature, Science, peer-reviewed journals tracking AI co-authorshipconf 65%
    Notes: Tests cross-domain transfer to scientific discovery — second pillar of the claim.
  6. 2028-02-11pendingQ3 window check-in (75%)
  7. 2027-01-01 → 2029-10-31pendingAI agent integrates with biological/environmental sensor stack in published study
    How: Published research demonstrates AI agent autonomously interfacing with biological or environmental sensor network to monitor and act on physical reality
    Source: arxiv, IEEE proceedings, robotics journalsconf 55%
    Notes: 'Manipulate physical reality via cross-domain transfer' element of the claim.
  8. 2027-06-01 → 2029-11-30pendingComposite cross-domain leaderboard launched (math+code+science+reasoning)
    How: Major eval org (Epoch, METR, Stanford HAI) publishes composite cross-domain benchmark with at least one model scoring ≥80%
    Source: Stanford AI Index 2027/2028, Epoch AIconf 45%
    Notes: Cascade — formal recognition of 'unified capability substrate' as measurable thing.

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 59%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
LBP2026-05-10T02:00:02Z59.1%-1.6pp
Network propagation: 60.7% → 59.1%
6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29
LBP2026-05-03T02:00:01Z60.7%-3.0pp
Network propagation: 63.7% → 60.7%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
LBP2026-04-30T16:39:51Z63.7%-5.5pp
Network propagation: 69.2% → 63.7%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
LBP2026-04-30T02:18:57Z69.2%-2.8pp
Network propagation: 72.0% → 69.2%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
killerTK06
China-Taiwan Military Conflict
8.0%0.0500.720+0.076
killerTK11
Autonomous Regulatory Block (Level 4 Halt)
10.0%0.0500.720+0.062

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

10 ticker(s) linked

Beneficiaries (6)

FROGGTLBBABATEAMGOOGLMSFT

Adverse (4)

UBERPGRTRVALL

Prerequisites (6)

Predictions that must hit first
TypePredTitleDomainLag
correlateS_AGI_MID_2029AGI mid: Kurzweil 2029 pathagi_general_capability
correlateS_AGI_FAST_2027AGI fast: drop-in remote worker by 2027-09agi_general_capability
correlateS_AGI_SLOW_2031AGI slow: Schmidt/Hassabis 5-10 year pathagi_general_capability
correlateS_AGI_WINTER_2036PLUSAGI delayed: capability plateau or AI winteragi_general_capability
killerTK11Autonomous Regulatory Block (Level 4 Halt)
killerTK06China-Taiwan Military Conflict

Dependents (0)

Predictions enabled by this
TypePredTitleDomainLag
No dependents

Validations (1)

Resolution events
Observed atStatusByNotes
2026-04-29partialthesis_timeline_v1.0_importGPT-5 / Claude 4 / Gemini 3 cross-domain benchmarks (math + coding + science) empirically validate transfer. AlphaFold → AlphaProof → AlphaGeometry demonstrate unified-substrate.

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "mode": "FORECAST",
  "role": "Cited-Other",
  "context": "Extends SEM_032 (Wissner-Gross Clay Millennium) and 248_002 (LEO-to-phone). Specific cross-domain-capability-substrate framing.",
  "to_year": 2029,
  "conv_cues": "coined framing; singularity-mathematical-skeleton",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "2026-2029",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Frontier model exceeds 50% on FrontierMath (math benchmark)",
      "source": "https://epoch.ai/benchmarks/frontiermath-tier-4 — GPT-5.5 Pro 52.4%, GPT-5.5 51.7% as of April 2026",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -9,
      "source_id": null,
      "confidence": 0.99,
      "source_url": "https://epoch.ai/benchmarks/frontiermath-tier-4",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-24",
      "research_origin": "deep_research",
      "measurement_criterion": "Any frontier LLM crosses 50% accuracy on Epoch AI's FrontierMath benchmark"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -8,
      "source_id": null,
      "expected_date": "2026-09-15",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "Same model leads both math + SWE-bench coding leaderboards simultaneously",
      "notes": "Direct evidence of 'unified capability substrate' — math leadership transfers to coding.",
      "source": "Epoch AI, SWE-bench, Scale Labs leaderboards",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.8,
      "source_url": "https://labs.scale.com/leaderboard",
      "expected_date": "2026-11-14",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2027-06-30",
        "from": "2026-04-01"
      },
      "measurement_criterion": "A single model variant simultaneously occupies top-3 on both FrontierMath and SWE-bench Verified leaderboards"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -6,
      "source_id": null,
      "expected_date": "2027-05-30",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI model produces peer-reviewed scientific discovery in non-CS field",
      "notes": "Tests cross-domain transfer to scientific discovery — second pillar of the claim.",
      "source": "Nature, Science, peer-reviewed journals tracking AI co-authorship",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.65,
      "expected_date": "2027-09-16",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-12-31",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Peer-reviewed paper attributes a novel discovery (in chemistry, biology, physics, or math) primarily to a frontier LLM/agent system"
    },
    {
      "kind": "scenario_signal",
      "label": "Scenario fires: AGI fast: drop-in remote worker by 2027-09",
      "status": "pending",
      "weight": 0.3,
      "ordinal": -4,
      "source_id": "S_AGI_FAST_2027",
      "expected_date": "2027-09-30",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2028-02-11",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI agent integrates with biological/environmental sensor stack in published study",
      "notes": "'Manipulate physical reality via cross-domain transfer' element of the claim.",
      "source": "arxiv, IEEE proceedings, robotics journals",
      "statu
... (truncated)