← Cockpit
234_048predictionAIAI-scaling

Next major revolutions in foundation models will come from small language models

Predictor: Alex Wissner-Gross · ep#234 "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced" · source

Prior probability
50.0%
Current probability
41.2%
evolves via intake + LBP
Conviction
4/5
Signal quality
B
Resolution
pending
Window
2026-04-30 – 2040-11-30
Edges in / out
6 / 0
Tickers exposed
37

Prediction text

Next major revolutions in foundation models will come from small language models | I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress

Verbatim quote

From episode "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced"
I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22
κ (discount)
0.844
Brier
0.0341
excellent
Hits / Misses
6 / 1
of 11 resolved
Hit rate
54.5%
Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

3 prob_history rows
0%25%50%75%100%prior 50%2026-04-302026-04-302026-05-03
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 41.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 8 pending
  1. 2026-09-01 → 2027-09-30pendingAn open SLM (<=15B params) matches GPT-4-class on MMLU/HellaSwag standard benchmarks
    How: Public benchmark (MMLU >=86, GPQA >=50) achieved by a model with <=15B activated params, peer-reproduced
    Source: Phi-4-mini 67% MMLU at 3.8B (Microsoft 2025-2026)conf 75%
  2. 2026-06-01 → 2028-06-30pendingFirst 'O1-class' reasoning architectural breakthrough published from SLM-side research
    How: Peer-reviewed or arXiv paper from non-frontier-lab origin demonstrates novel reasoning paradigm at <=20B params with clear lift over prior SOTA
    Source: Wissner-Gross thesis on accessibility-driven researchconf 60%
  3. 2027-01-01 → 2028-12-31pendingAcademic research output on SLMs surpasses LLM-scaling output
    How: ArXiv cs.CL submissions tagged for compact/efficient/SLM models exceed those tagged for >100B-scale work, per Semantic Scholar trend
    Source: Wissner-Gross accessibility argument + observed 2024-2026 trendconf 65%
  4. 2027-06-01 → 2029-12-31pendingOn-device SLMs reach 100% smartphone shipment penetration
    How: All major mobile OEMs (Apple, Samsung, Google) ship flagship + mid-tier with on-device SLM by default per IDC tracker
    Source: Apple Intelligence + Galaxy AI + Gemini Nano deployment trajectoryconf 80%
  5. 2028-11-10pendingQ1 window check-in (25%)
  6. 2028-01-01 → 2030-06-30pendingSLM enterprise adoption crosses LLM API spend (industry inflection)
    How: Gartner, IDC or analogous tracker reports majority of enterprise inference spend (>50%) on locally-hosted SLMs
    Source: Gartner 3x SLM-vs-LLM use forecast by 2027conf 55%
  7. 2031-05-25pendingQ2 window check-in (50%)
  8. 2033-12-05pendingQ3 window check-in (75%)

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 41%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
LBP2026-05-03T02:00:01Z41.2%-2.0pp
Network propagation: 43.2% → 41.2%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
LBP2026-04-30T16:39:51Z43.2%-2.3pp
Network propagation: 45.5% → 43.2%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
LBP2026-04-30T02:18:57Z45.5%-4.5pp
Network propagation: 50.0% → 45.5%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereqS_ASI_SLOW_2040PLUS
ASI slow: post-2040 / soft takeoff
60.0%0.5000.050-0.092
killerTK09
Energy Grid Cap (Data Center Power Wall)
35.0%0.0500.500-0.070
killerTK05
Rate Regime Persistence (10y > 5% through 2028)
30.0%0.0500.500-0.047
killerTK03
AI Regulatory Moratorium (EU/US Capability Freeze)
10.0%0.0500.500+0.043
killerTK02
AI Compute Supply Shock (TSMC/Taiwan Disruption)
12.0%0.0500.500+0.034

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MUWULFIRENEQIXALABAPLDASMIYASMLPLABNVDANBISCRWVAAPLAMTAMZNDELLGOOGLIRMLNVGYMETAMSFTORCLSFTBYSTX

Adverse (6)

ACNGENCHGGIBMWNSLRN

Prerequisites (6)

Predictions that must hit first
TypePredTitleDomainLag
prereqS_ASI_SLOW_2040PLUSASI slow: post-2040 / soft takeoffasi_recursive_self_improvement
killerTK09Energy Grid Cap (Data Center Power Wall)
killerTK05Rate Regime Persistence (10y > 5% through 2028)
killerTK01AGI Capability Plateau (2026-27 Training Stall)
killerTK02AI Compute Supply Shock (TSMC/Taiwan Disruption)
killerTK03AI Regulatory Moratorium (EU/US Capability Freeze)

Dependents (0)

Predictions enabled by this
TypePredTitleDomainLag
No dependents

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=dmtvGKuRE64",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress",
  "verbatim": "I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress",
  "conv_cues": "strongly suspect",
  "direction": "HAPPEN",
  "timeframe": "Unspecified future",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "An open SLM (<=15B params) matches GPT-4-class on MMLU/HellaSwag standard benchmarks",
      "source": "Phi-4-mini 67% MMLU at 3.8B (Microsoft 2025-2026)",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.75,
      "source_url": "https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/",
      "expected_date": "2027-03-17",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2027-09-30",
        "from": "2026-09-01"
      },
      "measurement_criterion": "Public benchmark (MMLU >=86, GPQA >=50) achieved by a model with <=15B activated params, peer-reproduced"
    },
    {
      "kind": "llm_pre_event",
      "label": "First 'O1-class' reasoning architectural breakthrough published from SLM-side research",
      "source": "Wissner-Gross thesis on accessibility-driven research",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.6,
      "expected_date": "2027-06-16",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-06-30",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Peer-reviewed or arXiv paper from non-frontier-lab origin demonstrates novel reasoning paradigm at <=20B params with clear lift over prior SOTA"
    },
    {
      "kind": "llm_pre_event",
      "label": "Academic research output on SLMs surpasses LLM-scaling output",
      "source": "Wissner-Gross accessibility argument + observed 2024-2026 trend",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.65,
      "expected_date": "2028-01-01",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-12-31",
        "from": "2027-01-01"
      },
      "measurement_criterion": "ArXiv cs.CL submissions tagged for compact/efficient/SLM models exceed those tagged for >100B-scale work, per Semantic Scholar trend"
    },
    {
      "kind": "llm_pre_event",
      "label": "On-device SLMs reach 100% smartphone shipment penetration",
      "source": "Apple Intelligence + Galaxy AI + Gemini Nano deployment trajectory",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.8,
      "expected_date": "2028-09-15",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2029-12-31",
        "from": "2027-06-01"
      },
      "measurement_criterion": "All major mobile OEMs (Apple, Samsung, Google) ship flagship + mid-tier with on-device SLM by default per IDC tracker"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -4,
      "source_id": null,
      "expected_date": "2028-11-10",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "SLM enterprise adoption crosses LLM API spend (industry inflection)",
      "source": "Gartner 3x SLM-vs-LLM use forecast by 2027",
      "status": "pending",
      "weight": 0.4,
... (truncated)