Leading AI model intelligence has been improving by roughly 2.5 IQ points per month since May 2024 — a sustained compounding rate that rapidly surpasses human-level baselines and mandates continuous hardware refresh cycles.
Predictor: Alex Wissner-Gross
Prediction text
Leading AI model intelligence has been improving by roughly 2.5 IQ points per month since May 2024 — a sustained compounding rate that rapidly surpasses human-level baselines and mandates continuous hardware refresh cycles. | Quarterly model releases; benchmark publications
Key catalyst: Quarterly model releases; benchmark publications
Watch events: New benchmark results (ARC-AGI, GDPval, HLE) rather than IQ tests; capability-frontier tracking
Resolution evidence
Norwegian Mensa test scores for frontier models rose from ~90 (GPT-4) to ~130+ (GPT-5/Claude Opus 4.7/Gemini 2.5) over ~24 months — consistent with claim in aggregate, though IQ tests are contested as AI benchmarks.
Predictor: Alex Wissner-Gross
Calibration plot (stated vs observed)
Evidence about this node from Alex Wissner-Gross is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).
Reference class
This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.
Probability over time
Milestone chain
- 2024-07-31overdueQ1 window check-in (25%)
- 2025-03-01overdueQ2 window check-in (50%)
- 2025-09-30overdueQ3 window check-in (75%)
- 2026-04-15hit2026 Stanford AI Index publishes year-over-year benchmark gainsHow: Stanford HAI releases 2026 AI Index Report containing technical-performance section with YoY benchmark deltasSource: https://hai.stanford.edu/ai-index/2026-ai-index-reportconf 95%
- 2026-04-30hitMensa Norway IQ benchmark top score climbs from 135 to 145How: TrackingAI Mensa Norway leaderboard documents 10+ point year-over-year improvement in top frontier model IQSource: https://binaryverseai.com/ai-iq-test-2025/conf 90%Notes: HIT — Grok-4.20 / GPT-5.4 Pro tied at 145 in April 2026, up 10 from 135 prior year. Implies ~0.8 IQ/month, below 2.5/month claim but trend confirmed.
- 2026-04-30hitHumanity's Last Exam jumps from 8.8% to 50%+ inside 12 monthsHow: Stanford AI Index / Epoch AI report documents top-model HLE accuracy rising from 8.8% (early 2025) to >=50% (April 2026)Source: https://hai.stanford.edu/ai-index/2026-ai-index-report/technical-performanceconf 95%Notes: HIT — HLE accuracy climbed >40 points YoY, far steeper than 2.5 IQ/month equivalent.
- 2026-01-01 → 2026-12-31pendingFrontier model releases sustain quarterly cadence through 2026How: At least 4 quarterly frontier-model releases (Anthropic/OpenAI/Google/xAI) in 2026 with measurable benchmark step-upsSource: https://llm-stats.com/ai-trendsconf 85%
- 2026-06-01 → 2027-06-30pendingHardware refresh cycle: GPU shipments accelerate to support compounding intelligenceHow: NVIDIA datacenter revenue grows >=2x YoY in any consecutive 4-quarter window during Wissner-Gross windowSource: NVIDIA quarterly earnings, https://investor.nvidia.comconf 70%
What if this resolves?
Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"
Evidence chain
Raw metadata
{
"source": "backfill_resolution_history.py",
"status": "partial",
"bayesian_v2": false,
"outcome_prob": 0.5,
"evidence_kind": "resolution_terminal",
"posterior_prob": 0.5,
"delta_to_outcome": 0.10524,
"inside_posterior": 0.39476,
"validation_notes": "Norwegian Mensa test scores for frontier models rose from ~90 (GPT-4) to ~130+ (GPT-5/Claude Opus 4.7/Gemini 2.5) over ~24 months — consistent with claim in aggregate, though IQ tests are contested as AI benchmarks.",
"validation_status": "hit",
"pre_resolution_prob": 0.39476,
"resolution_evidence": "Norwegian Mensa test scores for frontier models rose from ~90 (GPT-4) to ~130+ (GPT-5/Claude Opus 4.7/Gemini 2.5) over ~24 months — consistent with claim in aggregate, though IQ tests are contested as AI benchmarks.",
"does_not_update_current_prob": true
}Network propagation neighbors
Top incoming (parents)
Edges that influence THIS node's belief
Top outgoing (children)
Predictions THIS node influences
No outgoing edges.
Prerequisites (5)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| correlate | S_AGI_MID_2029 | AGI mid: Kurzweil 2029 path | agi_general_capability | — |
| correlate | S_AGI_FAST_2027 | AGI fast: drop-in remote worker by 2027-09 | agi_general_capability | — |
| killer | TK09 | Energy Grid Cap (Data Center Power Wall) | — | — |
| killer | TK02 | AI Compute Supply Shock (TSMC/Taiwan Disruption) | — | — |
| killer | TK06 | China-Taiwan Military Conflict | — | — |
Dependents (0)
| Type | Pred | Title | Domain | Lag |
|---|---|---|---|---|
| No dependents | ||||
Validations (1)
| Observed at | Status | By | Notes |
|---|---|---|---|
| 2026-04-29 | hit | thesis_timeline_v1.0_import | Norwegian Mensa test scores for frontier models rose from ~90 (GPT-4) to ~130+ (GPT-5/Claude Opus 4.7/Gemini 2.5) over ~24 months — consistent with claim in aggregate, though IQ tests are contested as AI benchmarks. |
Linked documents (10)
| Sim | Source | Title | Market prob | Polarity | Reviewed | Published |
|---|---|---|---|---|---|---|
| 0.677 | arxiv | Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact | — | mentions | pending | 2026-05-14 |
| 0.665 | gdelt | intelligence trust the equation that will decide australias ai winners 625399 | — | mentions | pending | 2026-04-30 |
| 0.655 | arxiv | ZAYA1-8B Technical Report | — | mentions | pending | 2026-05-06 |
| 0.648 | arxiv | Implicit Behavioral Decoding from Next-Step Spike Forecasts at Population Scale | — | mentions | pending | 2026-05-13 |
| 0.635 | manifold | Will HackerNews #1 story score go up in 24h? | 52% | mentions | pending | 2026-05-03 |
| 0.633 | github_release | google-deepmind/alphafold v2.2.0 | — | mentions | pending | 2022-03-10 |
| 0.622 | github_release | google-deepmind/alphafold v2.3.1 | — | mentions | pending | 2023-01-12 |
| 0.614 | arxiv | Boosting Self-Consistency with Ranking | — | mentions | pending | 2026-06-03 |
| 0.608 | arxiv | EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage | — | mentions | pending | 2026-05-05 |
| 0.590 | manifold | What will my custom Zetamac score (average of 5) be in a week? | — | mentions | pending | 2026-05-16 |
Raw metadata
{
"nia": false,
"qty": "+2.5 IQ pts/mo",
"mode": "OBSERVATION+FORECAST",
"role": "Host",
"context": "Wissner-Gross argues model IQ velocity is the key leading indicator of infrastructure obsolescence: even newly-deployed hardware becomes a legacy platform within 12-18 months.",
"to_year": 2026,
"conv_cues": "measurable velocity; quantified monthly rate",
"direction": "NUMERIC_TARGET",
"from_year": 2024,
"timeframe": "May 2024 - ongoing",
"conv_level": "HIGH",
"milestones": [
{
"kind": "quartile_checkpoint",
"label": "Q1 window check-in (25%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -6,
"source_id": null,
"expected_date": "2024-07-31",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "quartile_checkpoint",
"label": "Q2 window check-in (50%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -5,
"source_id": null,
"expected_date": "2025-03-01",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "quartile_checkpoint",
"label": "Q3 window check-in (75%)",
"status": "overdue",
"weight": 0.05,
"ordinal": -4,
"source_id": null,
"expected_date": "2025-09-30",
"observed_date": null,
"miss_emitted_at": "2026-05-02T22:07:21.384228+00:00",
"miss_emitted_by": "metadata_milestone_sweep"
},
{
"kind": "llm_pre_event",
"label": "2026 Stanford AI Index publishes year-over-year benchmark gains",
"source": "https://hai.stanford.edu/ai-index/2026-ai-index-report",
"status": "hit",
"weight": 0.4,
"ordinal": -3,
"source_id": null,
"confidence": 0.95,
"source_url": "https://hai.stanford.edu/ai-index/2026-ai-index-report",
"expected_date": "2026-04-15",
"observed_date": "2026-04-15",
"research_origin": "deep_research",
"measurement_criterion": "Stanford HAI releases 2026 AI Index Report containing technical-performance section with YoY benchmark deltas"
},
{
"kind": "llm_pre_event",
"label": "Mensa Norway IQ benchmark top score climbs from 135 to 145",
"notes": "HIT — Grok-4.20 / GPT-5.4 Pro tied at 145 in April 2026, up 10 from 135 prior year. Implies ~0.8 IQ/month, below 2.5/month claim but trend confirmed.",
"source": "https://binaryverseai.com/ai-iq-test-2025/",
"status": "hit",
"weight": 0.4,
"ordinal": -2,
"source_id": null,
"confidence": 0.9,
"source_url": "https://binaryverseai.com/ai-iq-test-2025/",
"expected_date": "2026-04-30",
"observed_date": "2026-04-30",
"research_origin": "deep_research",
"measurement_criterion": "TrackingAI Mensa Norway leaderboard documents 10+ point year-over-year improvement in top frontier model IQ"
},
{
"kind": "llm_pre_event",
"label": "Humanity's Last Exam jumps from 8.8% to 50%+ inside 12 months",
"notes": "HIT — HLE accuracy climbed >40 points YoY, far steeper than 2.5 IQ/month equivalent.",
"source": "https://hai.stanford.edu/ai-index/2026-ai-index-report/technical-performance",
"status": "hit",
"weight": 0.4,
"ordinal": -1,
"source_id": null,
"confidence": 0.95,
"source_url": "https://hai.stanford.edu/ai-index/2026-ai-index-report/technical-performance",
"expected_date": "2026-04-30",
"observed_date": "2026-04-30",
"research_origin": "deep_research",
"measurement_criterion": "Stanford AI Index / Epoch AI report documents top-model HLE accuracy rising from 8.8% (early 2025) to >=50% (April 2026)"
},
{
"kind": "event",
"label": "Leading AI model intelligence has been improving by roughly 2.5 IQ points per month since May 2024
... (truncated)