Back to section
Modely 🔥 Top

Google DeepMind Releases DiffusionGemma for Fast Local AI

Piatok 26. júna 2026 • Source: NVIDIA / Google DeepMind

What happened

Google DeepMind released DiffusionGemma — an experimental 26B-parameter MoE model (3.8B active) with a diffusion head that generates text in parallel blocks instead of autoregressive token-by-token decoding. NVIDIA simultaneously optimised it for RTX GPUs and DGX Spark for local inference.

Context and impact

Diffusion-based text generation is a long-studied alternative to autoregressive transformers (Inception Labs Mercury, Stanford SEDD), but hasn't had a strong production implementation until now. DiffusionGemma is the first mainstream MoE model with this architecture to ship as open weights. For local AI it could be a breakthrough: parallel block decoding is significantly faster than token-by-token, especially on consumer GPUs with limited memory bandwidth.

Details

  • Architecture: 26B parameters, 3.8B active (MoE)
  • Generates text in parallel blocks via a diffusion head
  • Optimisation: NVIDIA RTX GPUs and DGX Spark workstation
  • Classification: experimental — Google positions it as a research preview
  • Part of the open-weight Gemma 3 family
Open original source NVIDIA / Google DeepMind