Humans (weaker intelligences) can successfully align/contain super-intelligences via weak-to-strong supervision.
Predictor: Alex Wissner-Gross · ep#248 "Sam Altman's Attack, Amazon vs. Starlink, and What Opus 4.7 Actually Means | #248" · source
Prediction text
Humans (weaker intelligences) can successfully align/contain super-intelligences via weak-to-strong supervision. | this this entire exercise is a proxy for humans which are either already or about to be effectively weaker weaker intelligence is supervising the stronger int intelligence that that works... I I think this bodess very well for sort of a a tower of alignment where the weaker uh meat bodies, if you will, that that are humans unaded biologically are able to contain and align super intelligences
Verbatim quote
this this entire exercise is a proxy for humans which are either already or about to be effectively weaker weaker intelligence is supervising the stronger int intelligence that that works... I I think this bodess very well for sort of a a tower of alignment where the weaker uh meat bodies, if you will, that that are humans unaded biologically are able to contain and align super intelligences
Predictor: Alex Wissner-Gross
Calibration plot (stated vs observed)
Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class
This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.
Probability over time
Milestone chain
- 2025-06-30hitMid-2025 benchmark of weak-to-strong methods improves narrow-domain oversightHow: Independent benchmarks (DeepMind, Anthropic, academic labs) show weak-to-strong methods improving oversight fidelity in narrow domains by mid-2025Source: https://www.hushvault.ie/2026/01/27/superalignment-everything-you-need-to-know-for-ai-safety/conf 85%Notes: HIT — narrow-domain oversight gains documented; partially supports the prediction.
- 2024-12-14hitOpenAI Superalignment 'weak-to-strong generalization' research paper baselineHow: OpenAI's foundational weak-to-strong generalization paper / replication / extension confirms GPT-2-class models can elicit GPT-3.5-level performance from GPT-4 via supervised fine-tuningSource: https://openai.com/index/weak-to-strong-generalization/conf 99%Notes: HIT — OpenAI's published research is the canonical evidence base. Wissner-Gross's claim is grounded in this research.
- 2026-04-01 → 2026-12-31pendingAnthropic / DeepMind / OpenAI publish post-2025 results on weak-to-strong supervising frontier modelsHow: At least one frontier lab publishes results applying weak-to-strong techniques to GPT-5/Claude Mythos/Gemini-3 class models with measurable safety improvementSource: Anthropic, DeepMind, OpenAI alignment research blogsconf 55%Notes: Critical for moving from 2024 GPT-2/GPT-4 demo to actually-superhuman setup.
- 2026-06-01 → 2027-06-30pendingAdversarial demonstration: weak-to-strong fails when the strong model is intentionally deceptiveHow: Academic / industry research demonstrates weak-to-strong supervision can be defeated by adversarially-trained strong models that exhibit alignment-fakingSource: Anthropic alignment-faking research, MATS / Apollo Research papersconf 65%Notes: Cascade — would partially refute Wissner-Gross's optimism. Anthropic's 2024-25 alignment-faking results already lean this direction.
- 2027-01-01 → 2029-12-31pendingFirst superintelligence-class system contained / aligned by weaker supervisor in deploymentHow: Frontier lab demonstrates production-scale supervision of a system characterized as superhuman in a domain by humans/weaker models, with measurable safety properties holding under adversarial testingSource: Frontier lab research and deployment reportsconf 20%Notes: Cascade — direct test of the entire prediction. Most analysts model this as 2027-2030+ open.
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 234_012 Anthropic revenue will cross OpenAI revenue in middle of 202 — Peter Diamandis | 67.1% | 0.500 | 0.050 | -0.074 |
| prereq | SEM_042 2025 will be the definitive year that agentic systems finall — Kevin Weil | 73.8% | 0.500 | 0.050 | -0.045 |
| prereq | SEM_012 Nvidia quadrupled chip production output while only doubling — Jensen Huang | 75.0% | 0.500 | 0.050 | -0.039 |
| killer | TK03 AI Regulatory Moratorium (EU/US Capability Freeze) | 10.0% | 0.050 | 0.500 | +0.032 |
| prereq | SEM_008 Training runs costing $10 billion for a single model will co — Dario Amodei | 76.9% | 0.500 | 0.050 | -0.030 |
Top outgoing (children)
Predictions THIS node influences
| Kind | Node | Their prob | P(c|s=T) | P(c|s=F) | Δ implied |
|---|---|---|---|---|---|
| prereq | 231_013 Math is cooked (will be solved), physics cooked, biology cha — Alex Wissner-Gross | 35.4% | 0.620 | 0.050 | -0.067 |
| prereq | 241_043 ASI will arrive within 2 years to 5 years to this next decad — Peter Diamandis | 35.9% | 0.650 | 0.050 | -0.059 |
| prereq | CMQ_002 By 2028, AI systems will reach 'independent researcher' leve — Sam Altman | 31.4% | 0.550 | 0.050 | -0.056 |
| prereq | 235_030 Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 203 — Ray Kurzweil | 39.2% | 0.750 | 0.050 | -0.051 |
| prereq | 232_055 We're exiting the industrial age permanently as recursive se — Peter Diamandis | 35.5% | 0.700 | 0.050 | -0.034 |
Ticker exposure
Beneficiaries (23)
Adverse (6)
Prerequisites (8)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | SEM_008 | Training runs costing $10 billion for a single model will commence sometime in 2025. | AI | — |
| prereq | 238_009 | Recursive self-improvement is already happening now (no longer three years out) | AI | — |
| prereq | 234_012 | Anthropic revenue will cross OpenAI revenue in middle of 2026 | Markets/Stocks | — |
| prereq | SEM_012 | Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering. | AI/Manufacturing | — |
| prereq | SEM_042 | 2025 will be the definitive year that agentic systems finally hit the mainstream. | AI/Agents | — |
| killer | TK14 | Superbubble Pop (S&P 500 -40%, Moonshot Capital Evaporates) | — | — |
| killer | TK01 | AGI Capability Plateau (2026-27 Training Stall) | — | — |
| killer | TK03 | AI Regulatory Moratorium (EU/US Capability Freeze) | — | — |
Dependents (5)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| prereq | 235_030 | Ray Kurzweil predicts Longevity Escape Velocity (LEV) by 2033. | Biotech/Longevity | — |
| prereq | 232_055 | We're exiting the industrial age permanently as recursive self-improvement unfolds. | AI | — |
| prereq | 241_043 | ASI will arrive within 2 years to 5 years to this next decade | AI | — |
| prereq | 231_013 | Math is cooked (will be solved), physics cooked, biology char broiled. | AI | — |
| prereq | CMQ_002 | By 2028, AI systems will reach 'independent researcher' level — driving autonomous scientific discoveries without human intervention. | AI | — |
Linked documents (7)
| Sim | Source | Title | Market prob | Polarity | Reviewed | Published |
|---|---|---|---|---|---|---|
| 0.723 | arxiv | Automated alignment is harder than you think | — | mentions | pending | 2026-05-07 |
| 0.646 | arxiv | Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs | — | mentions | pending | 2026-05-28 |
| 0.638 | arxiv | Why Expert Alignment Is Hard: Evidence from Subjective Evaluation | — | mentions | pending | 2026-05-06 |
| 0.630 | arxiv | Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher | — | mentions | pending | 2026-05-31 |
| 0.587 | manifold | Who is the best human ever to exist (revised edition) | — | mentions | pending | 2026-05-25 |
| 0.558 | manifold | Will a particular friend of mine crack anyone while at math camp? | 8% | mentions | pending | 2026-04-24 |
| 0.546 | arxiv | S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs | — | mentions | pending | 2026-05-18 |
Raw metadata
{
"nia": false,
"url": "https://www.youtube.com/watch?v=LVvleNtllPk",
"mode": "THESIS",
"role": "Host",
"context": "humans unaded biologically are able to contain and align super intelligences that are stronger capability wise.",
"to_year": 2026,
"verbatim": "this this entire exercise is a proxy for humans which are either already or about to be effectively weaker weaker intelligence is supervising the stronger int intelligence that that works... I I think this bodess very well for sort of a a tower of alignment where the weaker uh meat bodies, if you will, that that are humans unaded biologically are able to contain and align super intelligences",
"conv_cues": "bodes very well",
"direction": "HAPPEN",
"from_year": 2026,
"timeframe": "future",
"conv_level": "HIGH",
"milestones": [
{
"kind": "llm_pre_event",
"label": "Mid-2025 benchmark of weak-to-strong methods improves narrow-domain oversight",
"notes": "HIT — narrow-domain oversight gains documented; partially supports the prediction.",
"source": "https://www.hushvault.ie/2026/01/27/superalignment-everything-you-need-to-know-for-ai-safety/",
"status": "hit",
"weight": 0.4,
"ordinal": -7,
"source_id": null,
"confidence": 0.85,
"source_url": "https://www.hushvault.ie/2026/01/27/superalignment-everything-you-need-to-know-for-ai-safety/",
"expected_date": "2025-06-30",
"observed_date": "2025-06-30",
"research_origin": "deep_research",
"measurement_criterion": "Independent benchmarks (DeepMind, Anthropic, academic labs) show weak-to-strong methods improving oversight fidelity in narrow domains by mid-2025"
},
{
"kind": "llm_pre_event",
"label": "OpenAI Superalignment 'weak-to-strong generalization' research paper baseline",
"notes": "HIT — OpenAI's published research is the canonical evidence base. Wissner-Gross's claim is grounded in this research.",
"source": "https://openai.com/index/weak-to-strong-generalization/",
"status": "hit",
"weight": 0.4,
"ordinal": -6,
"source_id": null,
"confidence": 0.99,
"source_url": "https://openai.com/index/weak-to-strong-generalization/",
"expected_date": "2025-12-31",
"observed_date": "2024-12-14",
"research_origin": "deep_research",
"measurement_criterion": "OpenAI's foundational weak-to-strong generalization paper / replication / extension confirms GPT-2-class models can elicit GPT-3.5-level performance from GPT-4 via supervised fine-tuning"
},
{
"kind": "prereq",
"label": "Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) a",
"status": "hit",
"weight": 0.5,
"ordinal": -5,
"source_id": "SEM_012",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "prereq",
"label": "Training runs costing $10 billion for a single model will commence sometime in 2025.",
"status": "hit",
"weight": 0.5,
"ordinal": -4,
"source_id": "SEM_008",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "prereq",
"label": "Anthropic revenue will cross OpenAI revenue in middle of 2026",
"status": "hit",
"weight": 0.5,
"ordinal": -3,
"source_id": "234_012",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "prereq",
"label": "2025 will be the definitive year that agentic systems finally hit the mainstream.",
"status": "hit",
"weight": 0.5,
"ordinal": -2,
"source_id": "SEM_042",
"expected_date": "2026-04-29",
"observed_date": "2026-04-29"
},
{
"kind": "prereq",
"label": "Recursive self-improvement is already happening now (no longer three years out)",
"status": "hit",
... (truncated)