234_017predictionAIAI-scaling

OpenAI codex lead predicts current coding agents will seem primitive in 10 weeks

Predictor: OpenAI Codex Lead · ep#234 "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced" · source

Prior probability

55.0%

Current probability

41.2%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-01-01 – 2026-10-31

Edges in / out

10 / 5

Tickers exposed

Prediction text

OpenAI codex lead predicts current coding agents will seem primitive in 10 weeks | I'm beyond excited for the next 10 weeks will bring. I think the current state of coding agents will be remembered as being so primitive it'll be funny in comparison.

Watch events: OpenAI next funding round; IPO timing; revenue disclosures

Verbatim quote

From episode "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced"

I'm beyond excited for the next 10 weeks will bring. I think the current state of coding agents will be remembered as being so primitive it'll be funny in comparison.

Predictor: OpenAI Codex Lead

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.500

Brier

—

Hits / Misses

0 / 0

Hit rate

—

Evidence about this node from OpenAI Codex Lead is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 41.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 6 fired ✓ · 1 overdue ⏱ · 3 pending

2026-04-24hitOpenAI ships GPT-5.5 with agentic-coding leadership benchmarks
How: OpenAI publicly releases a new Codex model that scores >=80% on Terminal-Bench 2.0 and >=70% on Expert-SWE long-horizon benchmark
Source: OpenAI: Introducing GPT-5.5 (April 24, 2026)conf 95%
Notes: GPT-5.5 hit 82.7% Terminal-Bench 2.0, 73.1% Expert-SWE, 84.9% GDPval — directly validates the 'primitive in 10 weeks' thesis.
2026-04-29hitNvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.
2026-04-29hitNvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.
2026-04-29hitNvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).
2026-04-29hitNvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a
2026-04-24hitCodex gains long-horizon scheduling and self-wakeup capability
How: OpenAI Codex documentation announces ability to schedule future work and resume tasks autonomously across days/weeks
Source: OpenAI Codex: Codex for (almost) everythingconf 90%
2026-05-15overdueGPT-5.5 1M-token context window enables full-codebase agentic refactors
How: OpenAI API ships 1M-token context for coding model and at least one published case study of agent autonomously modifying a >100k LOC repository
Source: DigitalApplied: GPT-5.5 Complete Guide — Thinking, Pro & 1M Contextconf 85%
2026-06-25pendingNvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.
2026-05-01 → 2026-08-31pendingCompeting labs ship coding agents matching or exceeding GPT-5.5 by mid-2026
How: At least two of (Anthropic, Google DeepMind, xAI) release a coding-specialized model with public Terminal-Bench 2.0 score >=80% within four months of GPT-5.5
Source: LM Council Benchmarks April 2026conf 70%
2026-06-01 → 2026-09-30pendingPre-2026 coding agents publicly characterized as obsolete by GPT-5.5 era developers
How: Major dev-tools blog (Cursor, GitHub, Replit, Anthropic) publishes retrospective explicitly calling 2025-era coding agents 'primitive' or equivalent
Source: Author's prediction (verbatim quote)conf 60%
2026-09-01pendingOpenAI codex lead predicts current coding agents will seem primitive in 10 weeks
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2030-09-27pendingMost large companies' business models will be disrupted in 2-5 years
2063-06-21pendingPeter's 14-year-old son Milan will never get a driver's license.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 41%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

metadata_milestone_miss_sweep2026-05-30T22:15:00Z41.2%-4.2pp

metadata_milestone_miss_sweep bayesian_v2 n=1 inside=0.412 blend=0.412 LLR=-0.172 κ=0.50 no_blend

Raw metadata

{
  "trf": 0.5051911152205567,
  "kappa": 0.5,
  "base_rate": null,
  "predictor": "OpenAI Codex Lead",
  "total_llr": -0.4054651081081644,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -0.18367569873006,
  "bayes_factor": "1.2:1 against",
  "blend_reason": "no reference_class linked",
  "inside_prior": 0.45420973759066463,
  "kappa_source": "predictor_table",
  "n_milestones": 1,
  "blend_applied": false,
  "contributions": [
    {
      "llr": -0.4054651081081644,
      "kind": "llm_pre_event",
      "kappa": 0.425,
      "label": "GPT-5.5 1M-token context window enables full-codebase agentic refactors",
      "weight": 0.4,
      "strength": "weak",
      "confidence": 0.85,
      "source_url": "https://www.digitalapplied.com/blog/gpt-5-5-complete-guide-thinking-pro-1m-context",
      "adjusted_llr": -0.17232267094596987,
      "expected_date": "2026-05-15",
      "measurement_criterion": "OpenAI API ships 1M-token context for coding model and at least one published case study of agent autonomously modifying a >100k LOC repository"
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "history_v2",
  "inside_weight": 0.6463662193456103,
  "outside_weight": 0.35363378065438966,
  "posterior_prob": 0.41192859177871083,
  "posterior_logit": -0.3559983696760299,
  "predictor_brier": null,
  "inside_posterior": 0.41192859177871083,
  "blended_posterior": 0.41192859177871083,
  "reference_class_id": null,
  "total_adjusted_llr": -0.17232267094596987,
  "predictor_n_resolved": 0
}

LBP2026-05-10T02:00:02Z45.4%-1.1pp

Network propagation: 46.5% → 45.4%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z46.5%-2.0pp

Network propagation: 48.5% → 46.5%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z48.5%-2.6pp

Network propagation: 51.2% → 48.5%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z51.2%-3.8pp

Network propagation: 55.0% → 51.2%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.550	+0.088
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.550	+0.078
prereq	SEM_014 Nvidia's Arizona-based TSMC factory successfully fabricated — Jensen Huang	86.1%	0.550	0.050	+0.064
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.550	+0.063
prereq	SEM_011 Nvidia became the world's first $5 trillion company (late 20 — Jensen Huang	85.5%	0.550	0.050	+0.063

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	247_023 AI will be able to do everything a white collar worker does — Dave Blundin	40.8%	0.720	0.050	-0.072
prereq	244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis	48.4%	0.920	0.050	-0.064
prereq	242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis	36.1%	0.650	0.050	-0.056
prereq	230_020 Peter's 14-year-old son Milan will never get a driver's lice — Peter Diamandis	34.7%	0.650	0.050	-0.041
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	-0.028

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (10)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	SEM_011	Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.	Capital Markets	—
prereq	SEM_027	Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.	Capital Markets	—
prereq	SEM_014	Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).	Manufacturing	—
prereq	SEM_012	Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.	AI/Manufacturing	—
prereq	SEM_015	Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.	Policy/Semis	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (5)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	244_019	Peter's son won't need a driver's license in 2 years	Auto/Transport	—
prereq	247_023	AI will be able to do everything a white collar worker does imminently	AI	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	242_031	Most large companies' business models will be disrupted in 2-5 years	Markets/Stocks	—
prereq	230_020	Peter's 14-year-old son Milan will never get a driver's license.	Auto/Transport	—

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.664	github_release	openai/openai-python v2.9.0	—	mentions	pending	2025-12-04
0.655	manifold	which will happen first? (Codeforces Rating)	—	mentions	pending	2026-05-24
0.654	github_release	openai/openai-python v2.15.0	—	mentions	pending	2026-01-09
0.653	github_release	openai/openai-python v2.12.0	—	mentions	pending	2025-12-15
0.651	github_release	openai/openai-python v2.36.0	—	mentions	pending	2026-05-07
0.650	github_release	openai/openai-python v2.23.0	—	mentions	pending	2026-02-24
0.648	github_release	openai/openai-python v2.25.0	—	mentions	pending	2026-03-05
0.648	github_release	openai/openai-python v2.13.0	—	mentions	pending	2025-12-16
0.647	github_release	openai/openai-python v2.33.0	—	mentions	pending	2026-04-28
0.643	github_release	openai/openai-python v2.16.0	—	mentions	pending	2026-01-27

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "10 weeks",
  "url": "https://www.youtube.com/watch?v=dmtvGKuRE64",
  "mode": "CITED_PREDICTION",
  "role": "Cited-Executive",
  "context": "OpenAI codeex lead predicts rapid evolution of AI agents within 10 weeks. Quote, I'm beyond excited for the next 10 weeks will bring. I think the current state of coding agents will be remembered as being so primitive it'll be funny in comparison.",
  "to_year": 2026,
  "cited_by": "Peter Diamandis",
  "verbatim": "I'm beyond excited for the next 10 weeks will bring. I think the current state of coding agents will be remembered as being so primitive it'll be funny in comparison.",
  "conv_cues": "beyond excited; I think",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "By mid-May 2026",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "OpenAI ships GPT-5.5 with agentic-coding leadership benchmarks",
      "notes": "GPT-5.5 hit 82.7% Terminal-Bench 2.0, 73.1% Expert-SWE, 84.9% GDPval — directly validates the 'primitive in 10 weeks' thesis.",
      "source": "OpenAI: Introducing GPT-5.5 (April 24, 2026)",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -10,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://openai.com/index/introducing-gpt-5-5/",
      "expected_date": "2026-04-24",
      "observed_date": "2026-04-24",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI publicly releases a new Codex model that scores >=80% on Terminal-Bench 2.0 and >=70% on Expert-SWE long-horizon benchmark"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -9,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -8,
      "source_id": "SEM_027",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -7,
      "source_id": "SEM_014",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -6,
      "source_id": "SEM_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "llm_pre_event",
      "label": "Codex gains long-horizon scheduling and self-wakeup capability",
      "source": "OpenAI Codex: Codex for (almost) everything",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.9,
      "source_url": "https://openai.com/index/codex-for-almost-everything/",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-24",
      "research_origin": "deep_research",
      "measurement_criterion": "OpenAI Codex documentation announces ability to schedule future work and resume tasks autonomously across days/weeks"
    },
    {
      "kind": "llm_pre_event",
      "label": "GPT-5.5 1M-token context window enables full-codebase agentic refactors",
      "source": "DigitalApplied: GPT-5.5 Complete Guide — Thinking, Pro & 1M Context",
      "status": "overdue",
      "weight": 0.4,
      "ordinal": -4,
      "source_id": null,
      "
... (truncated)