238_013predictionAIAGI

Frontier labs will increasingly keep their most capable models secret to self-advance

Predictor: Peter Diamandis · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source

Prior probability

65.0%

Current probability

46.0%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-04-30 – 2027-09-30

Edges in / out

6 / 0

Tickers exposed

Prediction text

Frontier labs will increasingly keep their most capable models secret to self-advance | We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.

Watch events: ARC-AGI-2 scores; Frontier Math Tier 4 benchmark; SWE-bench Verified; Humanity's Last Exam

Verbatim quote

From episode "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238"

We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.

Predictor: Peter Diamandis

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.875

Brier

0.0367

excellent

Hits / Misses

10 / 0

of 15 resolved

Hit rate

66.7%

Calibration plot (stated vs observed)

Evidence about this node from Peter Diamandis is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 46.0%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 1 fired ✓ · 4 pending

2026-04-30hitOpenAI's IMO-gold-medal model remains unreleased months after milestone
How: OpenAI publicly confirms the math-reasoning model that achieved IMO gold-medal performance has not been released to the public, months after the achievement was reported
Source: https://ai-frontiers.org/articles/the-hidden-ai-frontierconf 95%
Notes: HIT - 'Hidden AI Frontier' coverage explicitly confirms OpenAI's IMO-gold model 'will not be released for months' - direct corroboration of Diamandis' bifurcation thesis.
2026-07-28pendingQ1 window check-in (25%)
2026-10-25pendingQ2 window check-in (50%)
2027-01-22pendingQ3 window check-in (75%)
2026-06-01 → 2027-09-30pendingFrontier lab publicly admits internal-deployment-only model with significant capability gap vs released models
How: OpenAI, Anthropic, Google DeepMind, or xAI publishes a system card / safety policy that explicitly references an internal-only model with capabilities 'meaningfully ahead' of public releases
Source: https://arxiv.org/html/2604.23065 (Internal Model Deployment paper); https://metr.org/common-elementsconf 85%
2027-04-22pendingFrontier labs will increasingly keep their most capable models secret to self-advance
2026-09-01 → 2027-12-31pendingAI R&D acceleration measurable: lab discloses >=20% productivity gain from internal model use
How: Frontier lab publicly states that internal model use accelerates AI R&D pipeline by >=20% (e.g., training cycles, eval automation, paper synthesis)
Source: Frontier lab blog posts; CEO statements at conferencesconf 70%
Notes: Diamandis' explicit claim was that labs would use these internally to advance themselves faster.
2026-09-01 → 2027-12-31pendingPublic-vs-internal capability gap formally widens to >=6 months on a major benchmark
How: A frontier lab discloses or insider reporting (The Information, Bloomberg) confirms an internal model achieved a benchmark milestone (FrontierMath, ARC-AGI, SWE-bench Verified) >=6 months before public release
Source: The Information; Bloomberg; arXiv system cardsconf 60%
2026-09-01 → 2027-12-31pendingGovernment / regulator demands disclosure of internal-deployed capabilities
How: US AISI, UK AISI, or EU AI Office issues a formal request, regulation, or executive order requiring frontier labs to disclose internal deployments above a capability threshold
Source: US AISI; UK AISI; EU AI Office press releasesconf 40%
Notes: Cascade - if labs systematically hold back capabilities for self-advance, regulators will respond.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 46%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-17T02:00:01Z46.0%-1.6pp

Network propagation: 47.6% → 46.0%

5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96

LBP2026-05-10T02:00:02Z47.6%-3.3pp

Network propagation: 50.9% → 47.6%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z50.9%-6.4pp

Network propagation: 57.3% → 50.9%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z57.3%-4.2pp

Network propagation: 61.4% → 57.3%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z61.4%-3.6pp

Network propagation: 65.0% → 61.4%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	S_AGI_FAST_2027 AGI fast: drop-in remote worker by 2027-09	30.0%	0.650	0.050	-0.230
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.650	+0.130
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.650	+0.100

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

21 ticker(s) linked

Beneficiaries (14)

SOUN NVDA GTLB AI BBAI TCEHY AMZN BABA GOOGL IBM META MSFT ORCL SHOP

Adverse (7)

ACN CTSH FRSH CHGG IBM INFY PEGA

Prerequisites (6)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	S_AGI_FAST_2027	AGI fast: drop-in remote worker by 2027-09	agi_general_capability	—
correlate	S_AGI_MID_2029	AGI mid: Kurzweil 2029 path	agi_general_capability	—
correlate	S_AGI_SLOW_2031	AGI slow: Schmidt/Hassabis 5-10 year path	agi_general_capability	—
correlate	S_AGI_WINTER_2036PLUS	AGI delayed: capability plateau or AI winter	agi_general_capability	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (1)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.581	manifold	Will I get a Gold Medal on USAMO 2027?	47%	mentions	pending	2026-04-28

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
  "mode": "THESIS",
  "role": "Host",
  "context": "are you going to keep your model secret because you're going to be able to use them to advance your company far faster than anybody else? He said, 'No, no. Our go our job is to get out there to the public.' I don't believe that. We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see.",
  "verbatim": "We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.",
  "conv_cues": "first bifurcation; I don't believe that",
  "direction": "HAPPEN",
  "timeframe": "Ongoing",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "OpenAI's IMO-gold-medal model remains unreleased months after milestone",
      "notes": "HIT - 'Hidden AI Frontier' coverage explicitly confirms OpenAI's IMO-gold model 'will not be released for months' - direct corroboration of Diamandis' bifurcation thesis.",
      "source": "https://ai-frontiers.org/articles/the-hidden-ai-frontier",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://ai-frontiers.org/articles/the-hidden-ai-frontier",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-30",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI publicly confirms the math-reasoning model that achieved IMO gold-medal performance has not been released to the public, months after the achievement was reported"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -4,
      "source_id": null,
      "expected_date": "2026-07-28",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2026-10-25",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -2,
      "source_id": null,
      "expected_date": "2027-01-22",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "Frontier lab publicly admits internal-deployment-only model with significant capability gap vs released models",
      "source": "https://arxiv.org/html/2604.23065 (Internal Model Deployment paper); https://metr.org/common-elements",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -1,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://arxiv.org/html/2604.23065",
      "expected_date": "2027-01-30",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2027-09-30",
        "from": "2026-06-01"
      },
      "measurement_criterion": "OpenAI, Anthropic, Google DeepMind, or xAI publishes a system card / safety policy that explicitly references an internal-only model with capabilities 'meaningfully ahead' of public releases"
    },
    {
      "kind": "event",
      "label": "Frontier labs will increasingly keep their most capable models secret to self-advance",
      "status": "pending",
      "weight": 1,
      "ordinal": 0,
      "source_id": "238_013",
      "expected_date": "2027-04-22",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI R&D acceleration measurable: lab discloses >=20% productivity gain from internal model use",
      "notes": "Diamandis' explicit claim was that labs would use these internally to advance themselves faster.",
      "source
... (truncated)