v0.20.0
Gemma 4 Effective 2B (E2B) ollama run gemma4:e2b Effective 4B (E4B) ollama run gemma4:e4b 26B (Mixture of Experts model with 4B active parameters) ollama run gemma4:26b 31B (Dense) ollama run gemma4:31b What's Changed docs: update pi docs by @ParthSareen in #15152 mlx: respect tokenizer add_bos_token setting in pipeline by @dhiltgen in #15185 tokenizer: add SentencePiece-style BPE support by @dhiltgen in #15162 Full Changelog : v0.19.0...v0.20.0-rc0
Gemma 4
Effective 2B (E2B)
ollama run gemma4:e2b
Effective 4B (E4B)
ollama run gemma4:e4b
26B (Mixture of Experts model with 4B active parameters)
ollama run gemma4:26b
31B (Dense)
ollama run gemma4:31b
What's Changed
-
docs: update pi docs by @ParthSareen in #15152
-
mlx: respect tokenizer add_bos_token setting in pipeline by @dhiltgen in #15185
-
tokenizer: add SentencePiece-style BPE support by @dhiltgen in #15162
Full Changelog: v0.19.0...v0.20.0-rc0
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamamodelupdate
Element-based Formation Control: a Unified Perspective from Continuum Mechanics
arXiv:2604.04027v1 Announce Type: cross Abstract: This paper establishes a unified element-based framework for formation control by introducing the concept of the deformation gradient from continuum mechanics. Unlike traditional methods that rely on geometric constraints defined on graph edges, we model the formation as a discrete elastic body composed of simplicial elements. By defining a generalized distortion energy based on the local deformation gradient tensor, we derive a family of distributed control laws that can enforce various geometric invariances, including translation, rotation, scaling, and affine transformations. The convergence properties and the features of the proposed controllers are analyzed in detail. Theoretically, we show that the proposed framework serves as a bridg

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training
arXiv:2604.04230v1 Announce Type: cross Abstract: We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across training checkpoints of two open-source MoE models, OLMoE-1B-7B (20 checkpoints, with dense sampling in the surge region) and OpenMoE-8B (6 checkpoints), reveals a three-phase trajectory: a surge phase where the router learns to balance load (gamma_eff: 14 to 36-39, peaking in the step 30K-40K region), a stabilization phase where experts specialize under steady balance (B_0: 2.4 to 2.3, steps 100K-400K), and a relaxation phase where the router trades balance for quality as experts differentiate (gamma_eff: 27 to 9, steps 400K-1.2M

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage
arXiv:2604.03888v1 Announce Type: cross Abstract: This paper presents PolySwarm, a novel multi-agent large language model (LLM) framework designed for real-time prediction market trading and latency arbitrage on decentralized platforms such as Polymarket. PolySwarm deploys a swarm of 50 diverse LLM personas that concurrently evaluate binary outcome markets, aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities, and applying quarter-Kelly position sizing for risk-controlled execution. The system incorporates an information-theoretic market analysis engine using Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence to detect cross-market inefficiencies and negation pair mispricings. A l
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training
arXiv:2604.04230v1 Announce Type: cross Abstract: We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across training checkpoints of two open-source MoE models, OLMoE-1B-7B (20 checkpoints, with dense sampling in the surge region) and OpenMoE-8B (6 checkpoints), reveals a three-phase trajectory: a surge phase where the router learns to balance load (gamma_eff: 14 to 36-39, peaking in the step 30K-40K region), a stabilization phase where experts specialize under steady balance (B_0: 2.4 to 2.3, steps 100K-400K), and a relaxation phase where the router trades balance for quality as experts differentiate (gamma_eff: 27 to 9, steps 400K-1.2M

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage
arXiv:2604.03888v1 Announce Type: cross Abstract: This paper presents PolySwarm, a novel multi-agent large language model (LLM) framework designed for real-time prediction market trading and latency arbitrage on decentralized platforms such as Polymarket. PolySwarm deploys a swarm of 50 diverse LLM personas that concurrently evaluate binary outcome markets, aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities, and applying quarter-Kelly position sizing for risk-controlled execution. The system incorporates an information-theoretic market analysis engine using Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence to detect cross-market inefficiencies and negation pair mispricings. A l

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus
arXiv:2604.03809v1 Announce Type: cross Abstract: Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8K questions with three Qwen2.5-14B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we term representational collapse. DALC, a training-free consensus protocol that computes diversity weights from embedding geometry, reaches 87% on GSM8K versus 84% for self-consistency at 26% lower token cost. Ablation experiments reveal 1-3 point per-protocol run-to-run variance, confirm that hint sharing contributes more than diver

Agentic Federated Learning: The Future of Distributed Training Orchestration
arXiv:2604.04895v1 Announce Type: new Abstract: Although Federated Learning (FL) promises privacy and distributed collaboration, its effectiveness in real-world scenarios is often hampered by the stochastic heterogeneity of clients and unpredictable system dynamics. Existing static optimization approaches fail to adapt to these fluctuations, resulting in resource underutilization and systemic bias. In this work, we propose a paradigm shift towards Agentic-FL, a framework where Language Model-based Agents (LMagents) assume autonomous orchestration roles. Unlike rigid protocols, we demonstrate how server-side agents can mitigate selection bias through contextual reasoning, while client-side agents act as local guardians, dynamically managing privacy budgets and adapting model complexity to h

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!