237_029predictionAIAI-scaling

AI agents can do productive tasks indefinitely, for days on end, fundamentally changing the AI experience.

Predictor: Dave Blundin · ep#237 "OpenClaw Explained: Baby AGI, Security Threats, Mac Mini Became Everyone's Supercomputer" · source

Prior probability

60.0%

Current probability

46.6%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-06-01 – 2026-06-30

Edges in / out

10 / 5

Tickers exposed

Prediction text

AI agents can do productive tasks indefinitely, for days on end, fundamentally changing the AI experience. | I you I'm so with you on that too. I you know one thing that's really new in the world is >> it can do productive things indefinitely like days and days and days.

Verbatim quote

From episode "OpenClaw Explained: Baby AGI, Security Threats, Mac Mini Became Everyone's Supercomputer"

I you I'm so with you on that too. I you know one thing that's really new in the world is >> it can do productive things indefinitely like days and days and days.

Predictor: Dave Blundin

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.821

Brier

0.0491

excellent

Hits / Misses

3 / 2

of 9 resolved

Hit rate

33.3%

Calibration plot (stated vs observed)

Evidence about this node from Dave Blundin is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

7 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 46.6%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 6 fired ✓ · 1 overdue ⏱

2026-01-31hitClaude Code 99.9th percentile turn duration crosses 45 minutes
How: Anthropic reports >=45 minute autonomous turn duration at 99.9th percentile in Claude Code sessions
Source: https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf — 2026 Agentic Coding Trends Reportconf 95%
Notes: HIT — Anthropic reports 25min->45min P99.9 turn duration Oct 2025 to Jan 2026, signaling sustained autonomy uplift.
2026-04-29hitNvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.
2026-04-29hitNvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.
2026-04-29hitNvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).
2026-04-29hitNvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a
2026-04-30hitLong-running Claude scientific computing case studies published
How: Anthropic publishes case studies of Claude running scientific computing workloads autonomously for >12 hours continuously
Source: https://www.anthropic.com/research/long-running-Claude — Long-running Claude for scientific computingconf 95%
2026-01-01 → 2026-08-31overduePublic benchmark for multi-day agent task completion launches
How: METR, Anthropic, or third-party publishes a benchmark measuring agent capability on tasks expected to take humans >=24 hours of work, with frontier models scoring >=20%
Source: https://www.anthropic.com/research/measuring-agent-autonomy — Measuring AI agent autonomyconf 85%
2026-06-20pendingAI agents can do productive tasks indefinitely, for days on end, fundamentally changing the AI experience.
2026-06-01 → 2027-06-30pendingAgent runs autonomously for >=72 hours on production task without human intervention
How: Public case study from Anthropic, OpenAI, or enterprise customer documents agent running autonomously for >=72 hours on production engineering or research task with verified deliverable
Source: Anthropic blog, OpenAI research, enterprise case studiesconf 60%
2026-09-01 → 2027-12-31pendingMulti-agent orchestration runs >=1 week continuously
How: Documented multi-agent system completes self-directed task spanning >=7 days with periodic checkpoints but no continuous human supervision
Source: Anthropic CoWork, OpenAI multi-agent research, Manus.imconf 40%
Notes: Cascade — Blundin's 'days and days' framing requires multi-day, not multi-hour.
2028-06-25pendingWe're exiting the industrial age permanently as recursive self-improvement unfolds.
2030-09-27pendingMost large companies' business models will be disrupted in 2-5 years

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 47%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-24T02:00:02Z46.6%+1.3pp

Network propagation: 45.2% → 46.6%

4-iter LBP, residual 0.01000 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 806b02f8

LBP2026-05-17T02:00:01Z45.2%+2.7pp

Network propagation: 42.5% → 45.2%

5-iter LBP, residual 0.00689 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e607fa96

metadata_milestone_miss_sweep2026-05-10T22:10:52Z42.5%-7.0pp

metadata_milestone_miss_sweep bayesian_v2 n=1 inside=0.425 blend=0.425 LLR=-0.283 κ=0.82 no_blend

Raw metadata

{
  "trf": 1,
  "kappa": 0.8214,
  "base_rate": null,
  "predictor": "Dave Blundin",
  "total_llr": -0.4054651081081644,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -0.01913257734403183,
  "bayes_factor": "1.3:1 against",
  "blend_reason": "no reference_class linked",
  "inside_prior": 0.4952170015666818,
  "kappa_source": "predictor_table",
  "n_milestones": 1,
  "blend_applied": false,
  "contributions": [
    {
      "llr": -0.4054651081081644,
      "kind": "llm_pre_event",
      "kappa": 0.69819,
      "label": "Public benchmark for multi-day agent task completion launches",
      "weight": 0.4,
      "strength": "weak",
      "confidence": 0.85,
      "source_url": "https://www.anthropic.com/research/measuring-agent-autonomy",
      "adjusted_llr": -0.2830916838300393,
      "expected_date": "2026-05-02",
      "measurement_criterion": "METR, Anthropic, or third-party publishes a benchmark measuring agent capability on tasks expected to take humans >=24 hours of work, with frontier models scoring >=20%"
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "history_v2",
  "inside_weight": 0.3,
  "outside_weight": 0.7,
  "posterior_prob": 0.4250138342982685,
  "posterior_logit": -0.30222426117407114,
  "predictor_brier": 0.0491,
  "inside_posterior": 0.4250138342982685,
  "blended_posterior": 0.4250138342982685,
  "reference_class_id": null,
  "total_adjusted_llr": -0.2830916838300393,
  "predictor_n_resolved": 9
}

LBP2026-05-10T02:00:02Z49.5%-1.2pp

Network propagation: 50.7% → 49.5%

6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29

LBP2026-05-03T02:00:01Z50.7%-2.2pp

Network propagation: 52.9% → 50.7%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z52.9%-2.9pp

Network propagation: 55.8% → 52.9%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z55.8%-4.2pp

Network propagation: 60.0% → 55.8%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.600	+0.079
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.600	+0.068
killer	TK09 Energy Grid Cap (Data Center Power Wall)	35.0%	0.050	0.600	-0.058
prereq	SEM_014 Nvidia's Arizona-based TSMC factory successfully fabricated — Jensen Huang	86.1%	0.600	0.050	+0.053
killer	TK01 AGI Capability Plateau (2026-27 Training Stall)	15.0%	0.050	0.600	+0.052

Top outgoing (children)

Predictions THIS node influences

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	248_040 Pausing AI will fail and only accelerate race dynamics. — Alex Wissner-Gross	53.0%	0.920	0.050	-0.069
prereq	247_023 AI will be able to do everything a white collar worker does — Dave Blundin	40.8%	0.720	0.050	-0.041
prereq	242_031 Most large companies' business models will be disrupted in 2 — Peter Diamandis	36.1%	0.650	0.050	-0.027
prereq	244_019 Peter's son won't need a driver's license in 2 years — Peter Diamandis	48.4%	0.920	0.050	-0.023
prereq	232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis	35.5%	0.700	0.050	+0.003

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (10)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	SEM_011	Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.	Capital Markets	—
prereq	SEM_027	Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.	Capital Markets	—
prereq	SEM_014	Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).	Manufacturing	—
prereq	SEM_012	Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.	AI/Manufacturing	—
prereq	SEM_015	Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.	Policy/Semis	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (5)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
prereq	244_019	Peter's son won't need a driver's license in 2 years	Auto/Transport	—
prereq	248_040	Pausing AI will fail and only accelerate race dynamics.	AI	—
prereq	247_023	AI will be able to do everything a white collar worker does imminently	AI	—
prereq	232_055	We're exiting the industrial age permanently as recursive self-improvement unfolds.	AI	—
prereq	242_031	Most large companies' business models will be disrupted in 2-5 years	Markets/Stocks	—

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.706	arxiv	AIs and Humans with Agency	—	mentions	pending	2026-05-04
0.686	arxiv	Multi-Agent Computer Use	—	mentions	pending	2026-06-01
0.670	arxiv	cotomi Act: Learning to Automate Work by Watching You	—	mentions	pending	2026-05-04
0.661	arxiv	AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents	—	mentions	pending	2026-06-01
0.596	manifold	Will I use Hermes Agent for more than a week?	36%	mentions	pending	2026-05-09
0.577	manifold	On what days will I be productive this week?	—	mentions	pending	2026-05-10
0.577	manifold	On what days will I be productive this week?	—	mentions	pending	2026-05-03
0.577	manifold	On what days will I be productive this week?	—	mentions	pending	2026-04-27
0.577	manifold	On what days will I be productive this week?	—	mentions	pending	2026-06-01
0.573	manifold	On what days will I be productive [rest of the month]	—	mentions	pending	2026-05-16

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "qty": "days",
  "url": "https://www.youtube.com/watch?v=qP73cGLQmCU",
  "mode": "THESIS",
  "role": "Host",
  "caveats": "Mostly present-tense; near-future implication",
  "context": "I you know one thing that's really new in the world is >> it can do productive things indefinitely like days and days and days. >> And when I use my APIs that I loved a month ago, >> I have no idea what the bill is going to be. Like I literally have no idea if I turn it loose.",
  "to_year": 2026,
  "verbatim": "I you I'm so with you on that too. I you know one thing that's really new in the world is >> it can do productive things indefinitely like days and days and days.",
  "conv_cues": "can do productive things indefinitely",
  "direction": "UP",
  "from_year": 2026,
  "timeframe": "present/near-future",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Claude Code 99.9th percentile turn duration crosses 45 minutes",
      "notes": "HIT — Anthropic reports 25min->45min P99.9 turn duration Oct 2025 to Jan 2026, signaling sustained autonomy uplift.",
      "source": "https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf — 2026 Agentic Coding Trends Report",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf",
      "expected_date": "2026-01-31",
      "observed_date": "2026-01-31",
      "research_origin": "deep_research",
      "measurement_criterion": "Anthropic reports >=45 minute autonomous turn duration at 99.9th percentile in Claude Code sessions"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -6,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_027",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -4,
      "source_id": "SEM_014",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "prereq",
      "label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -3,
      "source_id": "SEM_012",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29"
    },
    {
      "kind": "llm_pre_event",
      "label": "Long-running Claude scientific computing case studies published",
      "source": "https://www.anthropic.com/research/long-running-Claude — Long-running Claude for scientific computing",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -2,
      "source_id": null,
      "confidence": 0.95,
      "source_url": "https://www.anthropic.com/research/long-running-Claude",
      "expected_date": "2026-04-30",
      "observed_date": "2026-04-30",
      "research_origin": "deep_research",
      "measurement_criterion": "Anthropic publishes case studies of Claude running scientific computing workloads autonomously for >12 hours continuously"
    },
    {
      "kind": "llm_pre_event",
      "label": "Public benchmark for multi-day agent task completion launches",
      "source": 
... (truncated)