AI will handle the entire software development process end-to-end within 6-12 months (by late 2026) — humans relegated to reviewer/editor role.
Predictor: Dario Amodei
Prediction text
AI will handle the entire software development process end-to-end within 6-12 months (by late 2026) — humans relegated to reviewer/editor role. | Claude Code + Devin-class agent maturity
Key catalyst: Claude Code + Devin-class agent maturity
Watch events: Agentic SWE benchmarks (SWE-bench, Terminal-bench); CS hiring data; Big Tech engineer headcount trajectories.
Resolution evidence
Claude Code + Cursor + Cognition Devin already handling substantial portions of SDLC; autonomous PR-merging agents in production at major firms 2026.
Predictor: Dario Amodei
Calibration plot (stated vs observed)
Evidence about this node from Dario Amodei is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class: ai_capability_milestone_2y
AI reaches specific named capability (intern-level / world-class programmer / etc) within 2y of stated target
Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.
Probability over time
Milestone chain
- 2026-05-04overdueQ1 window check-in (25%)
- 2026-04-01 → 2026-09-30pendingSWE-Bench Verified hits 90%+ scoreHow: Top model score on SWE-Bench Verified reaches ≥90% on the public leaderboard (current top is in the 70-80% range as of late 2025)Source: Anthropic blog, OpenAI evals page, Papers With Code SWE-Bench page, METR evaluationsconf 75%Notes: SWE-Bench Verified is the canonical 'AI handling software dev' benchmark. 90% indicates near-human performance on real GitHub issues.
- 2026-09-04pendingQ2 window check-in (50%)
- 2026-06-01 → 2026-12-31pendingAI agent demonstrates ≥1 workday autonomous task completionHow: Public demo or third-party evaluation (METR, Apollo Research) shows an AI agent (Devin-class, Claude Code-class) completing tasks requiring ≥8 hours of senior-engineer work without human interventionSource: Anthropic blog, Cognition (Devin) blog, METR.org evaluations, Apollo Research reportsconf 65%Notes: METR's 'task length' evaluation is the canonical agent-autonomy metric.
- 2026-07-01 → 2027-04-30pendingMajor tech company discloses majority of new code is AI-generatedHow: Public statement (earnings call, blog, internal memo leak) from top-5 tech company (Microsoft, Google, Meta, Amazon, Apple) that >50% of new code is AI-generated. Or developer survey shows similar at industry level.Source: Earnings call transcripts, GitHub Octoverse report, Stack Overflow Developer Survey, Microsoft/Google blog postsconf 55%Notes: Sundar Pichai stated 25%+ at Google in Oct 2024. Trajectory implies 50%+ within 2 years.
- 2027-01-05pendingQ3 window check-in (75%)
- 2026-10-01 → 2027-10-31pendingSoftware dev employment showing measurable AI-driven shiftHow: BLS quarterly data shows software developer employment growth turns negative for two consecutive quarters, OR named layoff announcements totaling ≥50,000 cite AI code automation as primary driverSource: Bureau of Labor Statistics quarterly reports, layoffs.fyi, Bloomberg labor coverageconf 45%Notes: Cascade — affects S_AGI_FAST/MID and labor displacement predictions.
- 2026-12-01 → 2027-08-31pendingFrontier lab announces human-out-of-loop production codeHow: Anthropic, OpenAI, Google, or peer announces a major production system where AI writes, reviews, AND deploys code with humans only in oversight role (no human PR review on majority of changes)Source: Lab blog posts, conference keynotes (NeurIPS, ICML, OpenAI DevDay), Reuters/Bloombergconf 50%
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Raw metadata
{
"trf": 0.7889686596425205,
"kappa": 0.6429,
"base_rate": null,
"predictor": "Dario Amodei",
"total_llr": 0.6931471805599453,
"bayesian_v2": true,
"prior_logit": 0.01024395655895965,
"bayes_factor": "1.6:1 favoring",
"blend_reason": "no reference_class linked",
"inside_prior": 0.5025609667444139,
"kappa_source": "predictor_table",
"blend_applied": false,
"contributions": [
{
"llr": 0.6931471805599453,
"kappa": 0.6429,
"label": "End-to-end software automation closer than skeptics estimated.",
"adjusted_llr": 0.4456243223819888
}
],
"evidence_kind": "intake_event_update",
"inside_source": "history_v2",
"inside_weight": 1,
"outside_weight": 0,
"posterior_prob": 0.6120335605622316,
"evidence_origin": "daily_intake",
"llm_suggestions": [
{
"polarity": "corroborates",
"status_change": "unchanged",
"evidence_strength": "moderate",
"delta_prob_suggestion": 0.04
}
],
"posterior_logit": 0.4558682789409485,
"predictor_brier": 0.03445,
"evidence_doc_ids": [],
"inside_posterior": 0.6120335605622316,
"blended_posterior": 0.6120335605622316,
"reference_class_id": null,
"total_adjusted_llr": 0.4456243223819888,
"predictor_n_resolved": 2
}Raw metadata
{
"trf": 0.8025098155804883,
"kappa": 0.6429,
"base_rate": null,
"predictor": "Dario Amodei",
"total_llr": -0.4054651081081644,
"grace_days": 7,
"bayesian_v2": true,
"prior_logit": 0.2709174745616987,
"bayes_factor": "1.3:1 against",
"blend_reason": "no reference_class linked",
"inside_prior": 0.5673181297530249,
"kappa_source": "predictor_table",
"n_milestones": 1,
"blend_applied": false,
"contributions": [
{
"llr": -0.4054651081081644,
"kind": "quartile_checkpoint",
"kappa": 0.6429,
"label": "Q1 window check-in (25%)",
"weight": 0.05,
"strength": "weak",
"confidence": null,
"source_url": null,
"adjusted_llr": -0.2606735180027389,
"expected_date": "2026-05-04",
"measurement_criterion": null
}
],
"evidence_kind": "metadata_milestone_miss_sweep",
"inside_source": "history_v2",
"inside_weight": 0.43824312909365815,
"outside_weight": 0.5617568709063419,
"posterior_prob": 0.5025609667444139,
"posterior_logit": 0.010243956558959766,
"predictor_brier": 0.03445,
"inside_posterior": 0.5025609667444139,
"blended_posterior": 0.5025609667444139,
"reference_class_id": null,
"total_adjusted_llr": -0.2606735180027389,
"predictor_n_resolved": 2
}Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 238_009 Recursive self-improvement is already happening now (no long — Alex Wissner-Gross | 78.1% | 0.680 | 0.050 | -0.023 |
Top outgoing (children)
Predictions THIS node influences
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 234_036 Job displacement will be issue 6-10 not top 5 in 10 years; A — Alex Wissner-Gross | 28.8% | 0.450 | 0.050 | -0.026 |
Prerequisites (2)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | 238_009 | Recursive self-improvement is already happening now (no longer three years out) | AI | — |
| correlate | S_AGI_MID_2029 | AGI mid: Kurzweil 2029 path | agi_general_capability | — |
Dependents (1)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | 234_036 | Job displacement will be issue 6-10 not top 5 in 10 years; AI discoveries will dominate | Labor/Jobs | — |
Linked documents (5)
| Sim | Source | Title | Market prob | Polarity | Reviewed | Published |
|---|---|---|---|---|---|---|
| 0.750 | codex_research_pack | METR - Measuring AI Ability to Complete Long Tasks | — | corroborates | pending | 2025-03-19 |
| 0.750 | codex_research_pack | OECD - Exploring Possible AI Trajectories Through 2030 | — | corroborates | pending | 2026-04-26 |
| 0.670 | arxiv | Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency | — | mentions | pending | 2026-05-28 |
| 0.617 | arxiv | When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability | — | mentions | pending | 2026-06-04 |
| 0.589 | manifold | Will MNX hire a backend developer before Manifest? | 26% | mentions | pending | 2026-04-30 |
Raw metadata
{
"nia": false,
"qty": "end-to-end SWE",
"mode": "FORECAST",
"role": "Cited-CEO",
"context": "Amodei early-2026 forecast; goes beyond code assistants to full SDLC autonomy. Structurally disrupts engineering labor economics.",
"to_year": 2027,
"conv_cues": "will be; CEO; specific timeframe",
"direction": "HAPPEN",
"from_year": 2026,
"timeframe": "6-12 months (by late 2026)",
"conv_level": "HIGH",
"milestones": [
{
"kind": "prereq",
"label": "Recursive self-improvement is already happening now (no longer three years out)",
"status": "hit",
"weight": 0.5,
"ordinal": -9,
"source_id": "238_009",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "quartile_checkpoint",
"label": "Q1 window check-in (25%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -8,
"source_id": null,
"expected_date": "2026-05-04",
"observed_date": null,
"miss_emitted_at": "2026-05-12T22:09:45.491809+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "llm_pre_event",
"label": "SWE-Bench Verified hits 90%+ score",
"notes": "SWE-Bench Verified is the canonical 'AI handling software dev' benchmark. 90% indicates near-human performance on real GitHub issues.",
"source": "Anthropic blog, OpenAI evals page, Papers With Code SWE-Bench page, METR evaluations",
"status": "pending",
"weight": 0.4,
"ordinal": -7,
"source_id": null,
"confidence": 0.75,
"expected_date": "2026-07-01",
"research_origin": "training",
"expected_date_range": {
"to": "2026-09-30",
"from": "2026-04-01"
},
"measurement_criterion": "Top model score on SWE-Bench Verified reaches ≥90% on the public leaderboard (current top is in the 70-80% range as of late 2025)"
},
{
"kind": "quartile_checkpoint",
"label": "Q2 window check-in (50%)",
"status": "pending",
"weight": 0.05,
"ordinal": -6,
"source_id": null,
"expected_date": "2026-09-04",
"observed_date": null
},
{
"kind": "llm_pre_event",
"label": "AI agent demonstrates ≥1 workday autonomous task completion",
"notes": "METR's 'task length' evaluation is the canonical agent-autonomy metric.",
"source": "Anthropic blog, Cognition (Devin) blog, METR.org evaluations, Apollo Research reports",
"status": "pending",
"weight": 0.4,
"ordinal": -5,
"source_id": null,
"confidence": 0.65,
"expected_date": "2026-09-15",
"research_origin": "training",
"expected_date_range": {
"to": "2026-12-31",
"from": "2026-06-01"
},
"measurement_criterion": "Public demo or third-party evaluation (METR, Apollo Research) shows an AI agent (Devin-class, Claude Code-class) completing tasks requiring ≥8 hours of senior-engineer work without human intervention"
},
{
"kind": "llm_pre_event",
"label": "Major tech company discloses majority of new code is AI-generated",
"notes": "Sundar Pichai stated 25%+ at Google in Oct 2024. Trajectory implies 50%+ within 2 years.",
"source": "Earnings call transcripts, GitHub Octoverse report, Stack Overflow Developer Survey, Microsoft/Google blog posts",
"status": "pending",
"weight": 0.4,
"ordinal": -4,
"source_id": null,
"confidence": 0.55,
"expected_date": "2026-11-29",
"research_origin": "training",
"expected_date_range": {
"to": "2027-04-30",
"from": "2026-07-01"
},
"measurement_criterion": "Public statement (earnings call, blog, internal memo leak) from top-5 tech company (Microsoft, Google, Meta, Amazon, Apple) that >50% of new code is AI-generated. Or developer survey shows similar at industry level."
},
{
"kind": "quartile_checkpoint",
"label": "Q3 window check-in (75%)
... (truncated)