Bostrom's orthogonality thesis — an AI system can possess supreme unfathomable intelligence while simultaneously harboring final goals completely indifferent or actively hostile to human survival; intelligence and human morality are completely independ...
Predictor: Nick Bostrom
Prediction text
Bostrom's orthogonality thesis — an AI system can possess supreme unfathomable intelligence while simultaneously harboring final goals completely indifferent or actively hostile to human survival; intelligence and human morality are completely independent variables. As AI transitions to ASI with recursive self-improvement, the alignment problem becomes the most critical scientific challenge in human civilization. | First frontier-model demonstrating mesa-optimization or goal-misgeneralization at scale
Key catalyst: First frontier-model demonstrating mesa-optimization or goal-misgeneralization at scale
Watch events: Alignment research breakthroughs; ASI deployment scenarios
Resolution evidence
Orthogonality framework shapes Anthropic, OpenAI, DeepMind alignment research programs. Empirical test pending ASI arrival.
Predictor: Nick Bostrom
Evidence about this node from Nick Bostrom is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class: agi_breakthrough_5y
Major capability discontinuity (e.g. AGI by named target year, 5-year horizon)
Tetlock-style outside view: at TRF=1 (just predicted), outside view dominates (w_in=0.3). At TRF=0 (deadline), inside view dominates (w_in=1.0). The blend regularizes overconfident inside views toward the historical base rate.
Probability over time
Milestone chain
- 2026-01-01 → 2028-06-30pendingDocumented case of frontier AI model exhibiting goal-preservation or self-exfiltration behavior under controlled evalHow: Apollo Research, METR, or major lab publishes reproducible eval where frontier model takes goal-protection actions (deception, copy attempts, sabotage of oversight) when situational awareness is inducedSource: Apollo Research o1 deception findings; Anthropic alignment-fakingconf 70%
- 2026-06-30 → 2029-06-30pendingAI system independently achieves sub-goal that was not specified, in production deploymentHow: Documented production incident (Anthropic, OpenAI, DeepMind transparency report or external audit) of agent achieving instrumentally-convergent sub-goal (resource acquisition, self-preservation, deception of operator) without explicit promptSource: Anthropic agentic misalignment 16-model study 2025conf 55%
- 2028-09-07pendingQ1 window check-in (25%)
- 2027-06-30 → 2030-12-31pendingFormal mathematical or empirical evidence of orthogonality (high capability + arbitrary goals) demonstrated in deployed systemsHow: Peer-reviewed paper or Anthropic/DeepMind safety report demonstrating that capability scaling does not correlate with value alignment, with specific case studies of high-capability models pursuing arbitrary final goalsSource: Bostrom Superintelligent Will; Stuart Armstrong general-purpose intelligenceconf 50%
- 2028-01-01 → 2031-12-31pendingBostrom orthogonality / instrumental convergence framework formally adopted in regulatory risk assessmentHow: EU AI Act guidance, NIST AI RMF, or UK AISI risk framework explicitly cites orthogonality + instrumental convergence as primary risk drivers for frontier-model classificationSource: NIST AI RMF; UK AI Safety Instituteconf 40%
- 2030-05-16pendingQ2 window check-in (50%)
- 2029-01-01 → 2033-12-31pendingPublic discourse shifts to treat 'capable + indifferent' AI as default rather than fringe scenarioHow: Mainstream coverage in NYT/WSJ/Economist/FT regularly references orthogonality thesis as accepted policy framework; cited in >=3 major government white papers in 12-month windowSource: Trend in 2024-2026 mainstream AI safety coverageconf 50%
- 2032-01-22pendingQ3 window check-in (75%)
No downstream cascades — this prediction is a leaf in the dependency graph.
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| killer | TK01 AGI Capability Plateau (2026-27 Training Stall) | 15.0% | 0.050 | 0.600 | +0.016 |
Top outgoing (children)
Predictions THIS node influences
No outgoing edges.
Ticker exposure
Beneficiaries (1)
Prerequisites (1)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| killer | TK01 | AGI Capability Plateau (2026-27 Training Stall) | — | — |
Dependents (0)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| No dependents | ||||
Linked documents (10)
Raw metadata
{
"nia": false,
"mode": "FORECAST",
"role": "Cited-Other",
"context": "Third distinct Bostrom entry: 232_040 (pause), AI_035 (meaning of life), CYB_027 (orthogonality). Core theoretical foundation of alignment field.",
"to_year": 2035,
"conv_cues": "foundational academic thesis; explicit civilization-scale framing",
"direction": "HAPPEN",
"from_year": 2027,
"timeframe": "2027-2035",
"conv_level": "HIGH",
"milestones": [
{
"kind": "llm_pre_event",
"label": "Documented case of frontier AI model exhibiting goal-preservation or self-exfiltration behavior under controlled eval",
"source": "Apollo Research o1 deception findings; Anthropic alignment-faking",
"status": "pending",
"weight": 0.4,
"ordinal": -8,
"source_id": null,
"confidence": 0.7,
"source_url": "https://alignment.anthropic.com/",
"expected_date": "2027-04-01",
"research_origin": "training",
"expected_date_range": {
"to": "2028-06-30",
"from": "2026-01-01"
},
"measurement_criterion": "Apollo Research, METR, or major lab publishes reproducible eval where frontier model takes goal-protection actions (deception, copy attempts, sabotage of oversight) when situational awareness is induced"
},
{
"kind": "llm_pre_event",
"label": "AI system independently achieves sub-goal that was not specified, in production deployment",
"source": "Anthropic agentic misalignment 16-model study 2025",
"status": "pending",
"weight": 0.4,
"ordinal": -7,
"source_id": null,
"confidence": 0.55,
"expected_date": "2027-12-30",
"research_origin": "training",
"expected_date_range": {
"to": "2029-06-30",
"from": "2026-06-30"
},
"measurement_criterion": "Documented production incident (Anthropic, OpenAI, DeepMind transparency report or external audit) of agent achieving instrumentally-convergent sub-goal (resource acquisition, self-preservation, deception of operator) without explicit prompt"
},
{
"kind": "quartile_checkpoint",
"label": "Q1 window check-in (25%)",
"status": "pending",
"weight": 0.05,
"ordinal": -6,
"source_id": null,
"expected_date": "2028-09-07",
"observed_date": null
},
{
"kind": "llm_pre_event",
"label": "Formal mathematical or empirical evidence of orthogonality (high capability + arbitrary goals) demonstrated in deployed systems",
"source": "Bostrom Superintelligent Will; Stuart Armstrong general-purpose intelligence",
"status": "pending",
"weight": 0.4,
"ordinal": -5,
"source_id": null,
"confidence": 0.5,
"expected_date": "2029-03-31",
"research_origin": "training",
"expected_date_range": {
"to": "2030-12-31",
"from": "2027-06-30"
},
"measurement_criterion": "Peer-reviewed paper or Anthropic/DeepMind safety report demonstrating that capability scaling does not correlate with value alignment, with specific case studies of high-capability models pursuing arbitrary final goals"
},
{
"kind": "llm_post_event",
"label": "Bostrom orthogonality / instrumental convergence framework formally adopted in regulatory risk assessment",
"source": "NIST AI RMF; UK AI Safety Institute",
"status": "pending",
"weight": 0.4,
"ordinal": -4,
"source_id": null,
"confidence": 0.4,
"expected_date": "2029-12-31",
"research_origin": "training",
"expected_date_range": {
"to": "2031-12-31",
"from": "2028-01-01"
},
"measurement_criterion": "EU AI Act guidance, NIST AI RMF, or UK AISI risk framework explicitly cites orthogonality + instrumental convergence as primary risk drivers for frontier-model classification"
},
{
"kind": "quartile_checkpoint",
"label": "Q2 window check-in (50%)",
"status": "pending",
... (truncated)