← Cockpit
238_013predictionAIAGI

Frontier labs will increasingly keep their most capable models secret to self-advance

Predictor: Peter Diamandis · ep#238 "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238" · source

Prior probability
65.0%
Current probability
46.0%
evolves via intake + LBP
Conviction
4/5
Signal quality
B
Resolution
pending
Window
2026-04-30 – 2027-09-30
Edges in / out
6 / 0
Tickers exposed
21

Prediction text

Frontier labs will increasingly keep their most capable models secret to self-advance | We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.

Watch events: ARC-AGI-2 scores; Frontier Math Tier 4 benchmark; SWE-bench Verified; Humanity's Last Exam

Verbatim quote

From episode "Meta Buys Moltbook, GPT 5.4, and Fruitfly Brain Upload | Moonshots Live at The Abundance Summit 238"
We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.

Predictor: Peter Diamandis

κ + Brier as of 2026-05-22
κ (discount)
0.875
Brier
0.0367
excellent
Hits / Misses
10 / 0
of 15 resolved
Hit rate
66.7%
Calibration plot (stated vs observed)

Evidence about this node from Peter Diamandis is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows
0%25%50%75%100%prior 65%2026-04-302026-05-032026-05-17
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 46.0%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 1 fired ✓ · 4 pending
  1. 2026-04-30hitOpenAI's IMO-gold-medal model remains unreleased months after milestone
    How: OpenAI publicly confirms the math-reasoning model that achieved IMO gold-medal performance has not been released to the public, months after the achievement was reported
    Source: https://ai-frontiers.org/articles/the-hidden-ai-frontierconf 95%
    Notes: HIT - 'Hidden AI Frontier' coverage explicitly confirms OpenAI's IMO-gold model 'will not be released for months' - direct corroboration of Diamandis' bifurcation thesis.
  2. 2026-07-28pendingQ1 window check-in (25%)
  3. 2026-10-25pendingQ2 window check-in (50%)
  4. 2027-01-22pendingQ3 window check-in (75%)
  5. 2026-06-01 → 2027-09-30pendingFrontier lab publicly admits internal-deployment-only model with significant capability gap vs released models
    How: OpenAI, Anthropic, Google DeepMind, or xAI publishes a system card / safety policy that explicitly references an internal-only model with capabilities 'meaningfully ahead' of public releases
    Source: https://arxiv.org/html/2604.23065 (Internal Model Deployment paper); https://metr.org/common-elementsconf 85%
  6. 2026-09-01 → 2027-12-31pendingAI R&D acceleration measurable: lab discloses >=20% productivity gain from internal model use
    How: Frontier lab publicly states that internal model use accelerates AI R&D pipeline by >=20% (e.g., training cycles, eval automation, paper synthesis)
    Source: Frontier lab blog posts; CEO statements at conferencesconf 70%
    Notes: Diamandis' explicit claim was that labs would use these internally to advance themselves faster.
  7. 2026-09-01 → 2027-12-31pendingPublic-vs-internal capability gap formally widens to >=6 months on a major benchmark
    How: A frontier lab discloses or insider reporting (The Information, Bloomberg) confirms an internal model achieved a benchmark milestone (FrontierMath, ARC-AGI, SWE-bench Verified) >=6 months before public release
    Source: The Information; Bloomberg; arXiv system cardsconf 60%
  8. 2026-09-01 → 2027-12-31pendingGovernment / regulator demands disclosure of internal-deployed capabilities
    How: US AISI, UK AISI, or EU AI Office issues a formal request, regulation, or executive order requiring frontier labs to disclose internal deployments above a capability threshold
    Source: US AISI; UK AISI; EU AI Office press releasesconf 40%
    Notes: Cascade - if labs systematically hold back capabilities for self-advance, regulators will respond.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 46%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
LBP2026-05-17T02:00:01Z46.0%-1.6pp
Network propagation: 47.6% → 46.0%
5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96
LBP2026-05-10T02:00:02Z47.6%-3.3pp
Network propagation: 50.9% → 47.6%
6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29
LBP2026-05-03T02:00:01Z50.9%-6.4pp
Network propagation: 57.3% → 50.9%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
LBP2026-04-30T16:39:51Z57.3%-4.2pp
Network propagation: 61.4% → 57.3%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
LBP2026-04-30T02:18:57Z61.4%-3.6pp
Network propagation: 65.0% → 61.4%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereqS_AGI_FAST_2027
AGI fast: drop-in remote worker by 2027-09
30.0%0.6500.050-0.230
killerTK03
AI Regulatory Moratorium (EU/US Capability Freeze)
10.0%0.0500.650+0.130
killerTK01
AGI Capability Plateau (2026-27 Training Stall)
15.0%0.0500.650+0.100

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

21 ticker(s) linked

Beneficiaries (14)

SOUNNVDAGTLBAIBBAITCEHYAMZNBABAGOOGLIBMMETAMSFTORCLSHOP

Adverse (7)

ACNCTSHFRSHCHGGIBMINFYPEGA

Prerequisites (6)

Predictions that must hit first
TypePredTitleDomainLag
prereqS_AGI_FAST_2027AGI fast: drop-in remote worker by 2027-09agi_general_capability
correlateS_AGI_MID_2029AGI mid: Kurzweil 2029 pathagi_general_capability
correlateS_AGI_SLOW_2031AGI slow: Schmidt/Hassabis 5-10 year pathagi_general_capability
correlateS_AGI_WINTER_2036PLUSAGI delayed: capability plateau or AI winteragi_general_capability
killerTK01AGI Capability Plateau (2026-27 Training Stall)
killerTK03AI Regulatory Moratorium (EU/US Capability Freeze)

Dependents (0)

Predictions enabled by this
TypePredTitleDomainLag
No dependents

Linked documents (1)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT
SimSourceTitleMarket probPolarityReviewedPublished
0.581manifoldWill I get a Gold Medal on USAMO 2027?47%mentionspending2026-04-28

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=d__HRChE2ZE",
  "mode": "THESIS",
  "role": "Host",
  "context": "are you going to keep your model secret because you're going to be able to use them to advance your company far faster than anybody else? He said, 'No, no. Our go our job is to get out there to the public.' I don't believe that. We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see.",
  "verbatim": "We still don't have the model that they used to win the gold medal in the IMO... that's the first bifocation that you see. We used to have the frontier model every single time. The moment they got to that, that was the last time.",
  "conv_cues": "first bifurcation; I don't believe that",
  "direction": "HAPPEN",
  "timeframe": "Ongoing",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "OpenAI's IMO-gold-medal model remains unreleased months after milestone",
      "notes": "HIT - 'Hidden AI Frontier' coverage explicitly confirms OpenAI's IMO-gold model 'will not be released for months' - direct corroboration of Diamandis' bifurcation thesis.",
      "source": "https://ai-frontiers.org/articles/the-hidden-ai-frontier",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://ai-frontiers.org/articles/the-hidden-ai-frontier",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-30",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI publicly confirms the math-reasoning model that achieved IMO gold-medal performance has not been released to the public, months after the achievement was reported"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -4,
      "source_id": null,
      "expected_date": "2026-07-28",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q2 window check-in (50%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -3,
      "source_id": null,
      "expected_date": "2026-10-25",
      "observed_date": null
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q3 window check-in (75%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -2,
      "source_id": null,
      "expected_date": "2027-01-22",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "Frontier lab publicly admits internal-deployment-only model with significant capability gap vs released models",
      "source": "https://arxiv.org/html/2604.23065 (Internal Model Deployment paper); https://metr.org/common-elements",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -1,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://arxiv.org/html/2604.23065",
      "expected_date": "2027-01-30",
      "research_origin": "deep_research",
      "expected_date_range": {
        "to": "2027-09-30",
        "from": "2026-06-01"
      },
      "measurement_criterion": "OpenAI, Anthropic, Google DeepMind, or xAI publishes a system card / safety policy that explicitly references an internal-only model with capabilities 'meaningfully ahead' of public releases"
    },
    {
      "kind": "event",
      "label": "Frontier labs will increasingly keep their most capable models secret to self-advance",
      "status": "pending",
      "weight": 1,
      "ordinal": 0,
      "source_id": "238_013",
      "expected_date": "2027-04-22",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "AI R&D acceleration measurable: lab discloses >=20% productivity gain from internal model use",
      "notes": "Diamandis' explicit claim was that labs would use these internally to advance themselves faster.",
      "source
... (truncated)