234_048predictionAIAI-scaling

Next major revolutions in foundation models will come from small language models

Predictor: Alex Wissner-Gross · ep#234 "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced" · source

Prior probability

50.0%

Current probability

41.2%

evolves via intake + LBP

Conviction

4/5

Signal quality

Resolution

pending

Window

2026-04-30 – 2040-11-30

Edges in / out

6 / 0

Tickers exposed

Prediction text

Next major revolutions in foundation models will come from small language models | I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress

Verbatim quote

From episode "Anthropic vs. The Pentagon, Claude Outpaces ChatGPT, and Consulting Gets Replaced"

I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress

Predictor: Alex Wissner-Gross

κ + Brier as of 2026-05-22

Full calibration →

κ (discount)

0.844

Brier

0.0341

excellent

Hits / Misses

6 / 1

of 11 resolved

Hit rate

54.5%

Calibration plot (stated vs observed)

Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

3 prob_history rows

intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 41.2%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.

Leading chain: 8 pending

2026-09-01 → 2027-09-30pendingAn open SLM (<=15B params) matches GPT-4-class on MMLU/HellaSwag standard benchmarks
How: Public benchmark (MMLU >=86, GPQA >=50) achieved by a model with <=15B activated params, peer-reproduced
Source: Phi-4-mini 67% MMLU at 3.8B (Microsoft 2025-2026)conf 75%
2026-06-01 → 2028-06-30pendingFirst 'O1-class' reasoning architectural breakthrough published from SLM-side research
How: Peer-reviewed or arXiv paper from non-frontier-lab origin demonstrates novel reasoning paradigm at <=20B params with clear lift over prior SOTA
Source: Wissner-Gross thesis on accessibility-driven researchconf 60%
2027-01-01 → 2028-12-31pendingAcademic research output on SLMs surpasses LLM-scaling output
How: ArXiv cs.CL submissions tagged for compact/efficient/SLM models exceed those tagged for >100B-scale work, per Semantic Scholar trend
Source: Wissner-Gross accessibility argument + observed 2024-2026 trendconf 65%
2027-06-01 → 2029-12-31pendingOn-device SLMs reach 100% smartphone shipment penetration
How: All major mobile OEMs (Apple, Samsung, Google) ship flagship + mid-tier with on-device SLM by default per IDC tracker
Source: Apple Intelligence + Galaxy AI + Gemini Nano deployment trajectoryconf 80%
2028-11-10pendingQ1 window check-in (25%)
2028-01-01 → 2030-06-30pendingSLM enterprise adoption crosses LLM API spend (industry inflection)
How: Gartner, IDC or analogous tracker reports majority of enterprise inference spend (>50%) on locally-hosted SLMs
Source: Gartner 3x SLM-vs-LLM use forecast by 2027conf 55%
2031-05-25pendingQ2 window check-in (50%)
2033-12-05pendingQ3 window check-in (75%)
2036-06-18pendingNext major revolutions in foundation models will come from small language models

No downstream cascades — this prediction is a leaf in the dependency graph.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.

(live posterior: 41%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first

LBP2026-05-03T02:00:01Z41.2%-2.0pp

Network propagation: 43.2% → 41.2%

6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9

LBP2026-04-30T16:39:51Z43.2%-2.3pp

Network propagation: 45.5% → 43.2%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3

LBP2026-04-30T02:18:57Z45.5%-4.5pp

Network propagation: 50.0% → 45.5%

5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact

All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

Kind	Node	Their prob	P(c\|s=T)	P(c\|s=F)	Δ implied
prereq	S_ASI_SLOW_2040PLUS ASI slow: post-2040 / soft takeoff	60.0%	0.500	0.050	-0.092
killer	TK09 Energy Grid Cap (Data Center Power Wall)	35.0%	0.050	0.500	-0.070
killer	TK05 Rate Regime Persistence (10y > 5% through 2028)	30.0%	0.050	0.500	-0.047
killer	TK03 AI Regulatory Moratorium (EU/US Capability Freeze)	10.0%	0.050	0.500	+0.043
killer	TK02 AI Compute Supply Shock (TSMC/Taiwan Disruption)	12.0%	0.050	0.500	+0.034

Top outgoing (children)

Predictions THIS node influences

No outgoing edges.

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MU WULF IREN EQIX ALAB APLD ASMIY ASML PLAB NVDA NBIS CRWV AAPL AMT AMZN DELL GOOGL IRM LNVGY META MSFT ORCL SFTBY STX

Adverse (6)

ACN GEN CHGG IBM WNS LRN

Prerequisites (6)

Predictions that must hit first

Type	Pred	Title	Domain	Lag
prereq	S_ASI_SLOW_2040PLUS	ASI slow: post-2040 / soft takeoff	asi_recursive_self_improvement	—
killer	TK09	Energy Grid Cap (Data Center Power Wall)	—	—
killer	TK05	Rate Regime Persistence (10y > 5% through 2028)	—	—
killer	TK01	AGI Capability Plateau (2026-27 Training Stall)	—	—
killer	TK02	AI Compute Supply Shock (TSMC/Taiwan Disruption)	—	—
killer	TK03	AI Regulatory Moratorium (EU/US Capability Freeze)	—	—

Dependents (0)

Predictions enabled by this

Type	Pred	Title	Domain	Lag
No dependents

Linked documents (10)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT

Sim	Source	Title	Market prob	Polarity	Reviewed	Published
0.666	arxiv	Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models	—	mentions	pending	2026-05-07
0.652	arxiv	Evaluation of LLMs for Mathematical Formalization in Lean	—	mentions	pending	2026-06-04
0.651	arxiv	Foundation Models for Credit Risk Prediction: A Game Changer?	—	mentions	pending	2026-05-18
0.645	arxiv	ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference	—	mentions	pending	2026-06-01
0.643	arxiv	A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning	—	mentions	pending	2026-05-13
0.640	arxiv	Large Language Models are Perplexed by some Political Parties	—	mentions	pending	2026-06-04
0.638	arxiv	Visual Fingerprints for LLM Generation Comparison	—	mentions	pending	2026-05-07
0.637	arxiv	Emergent Transfer of a Physics Foundation Model from Simulation to Laboratory Turbulence	—	mentions	pending	2026-05-31
0.632	arxiv	Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models	—	mentions	pending	2026-06-03
0.626	arxiv	Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis	—	mentions	pending	2026-05-07

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook

{
  "nia": false,
  "url": "https://www.youtube.com/watch?v=dmtvGKuRE64",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress",
  "verbatim": "I I I strongly suspect that the next major revolutions in in like 01 level revolutions in in foundation models will come from the small side because it's so much more accessible and so much easier for researchers to make progress",
  "conv_cues": "strongly suspect",
  "direction": "HAPPEN",
  "timeframe": "Unspecified future",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "An open SLM (<=15B params) matches GPT-4-class on MMLU/HellaSwag standard benchmarks",
      "source": "Phi-4-mini 67% MMLU at 3.8B (Microsoft 2025-2026)",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.75,
      "source_url": "https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/",
      "expected_date": "2027-03-17",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2027-09-30",
        "from": "2026-09-01"
      },
      "measurement_criterion": "Public benchmark (MMLU >=86, GPQA >=50) achieved by a model with <=15B activated params, peer-reproduced"
    },
    {
      "kind": "llm_pre_event",
      "label": "First 'O1-class' reasoning architectural breakthrough published from SLM-side research",
      "source": "Wissner-Gross thesis on accessibility-driven research",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.6,
      "expected_date": "2027-06-16",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-06-30",
        "from": "2026-06-01"
      },
      "measurement_criterion": "Peer-reviewed or arXiv paper from non-frontier-lab origin demonstrates novel reasoning paradigm at <=20B params with clear lift over prior SOTA"
    },
    {
      "kind": "llm_pre_event",
      "label": "Academic research output on SLMs surpasses LLM-scaling output",
      "source": "Wissner-Gross accessibility argument + observed 2024-2026 trend",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -6,
      "source_id": null,
      "confidence": 0.65,
      "expected_date": "2028-01-01",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2028-12-31",
        "from": "2027-01-01"
      },
      "measurement_criterion": "ArXiv cs.CL submissions tagged for compact/efficient/SLM models exceed those tagged for >100B-scale work, per Semantic Scholar trend"
    },
    {
      "kind": "llm_pre_event",
      "label": "On-device SLMs reach 100% smartphone shipment penetration",
      "source": "Apple Intelligence + Galaxy AI + Gemini Nano deployment trajectory",
      "status": "pending",
      "weight": 0.4,
      "ordinal": -5,
      "source_id": null,
      "confidence": 0.8,
      "expected_date": "2028-09-15",
      "research_origin": "training",
      "expected_date_range": {
        "to": "2029-12-31",
        "from": "2027-06-01"
      },
      "measurement_criterion": "All major mobile OEMs (Apple, Samsung, Google) ship flagship + mid-tier with on-device SLM by default per IDC tracker"
    },
    {
      "kind": "quartile_checkpoint",
      "label": "Q1 window check-in (25%)",
      "status": "pending",
      "weight": 0.05,
      "ordinal": -4,
      "source_id": null,
      "expected_date": "2028-11-10",
      "observed_date": null
    },
    {
      "kind": "llm_pre_event",
      "label": "SLM enterprise adoption crosses LLM API spend (industry inflection)",
      "source": "Gartner 3x SLM-vs-LLM use forecast by 2027",
      "status": "pending",
      "weight": 0.4,
... (truncated)