b8629

llama.cpp Releasesby ggml-orgApril 2, 20261 min read1 views

sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB ( #21283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8629

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llama

Open Source AILive

Gemma 4 Complete Guide: Architecture, Models, and Deployment in 2026

Google DeepMind released Gemma 4 on April 3, 2026 under Apache 2.0 — a significant licensing shift from previous Gemma releases that makes it genuinely usable for commercial products without legal ambiguity. This guide covers the full model family, architecture decisions worth understanding, and practical deployment paths across cloud, local, and mobile. The Four Models and When to Use Each Gemma 4 ships in four sizes with meaningfully different architectures: Model Params Active Architecture VRAM (4-bit) Target E2B ~2.3B all Dense + PLE ~2GB Mobile / edge E4B ~4.5B all Dense + PLE ~3.6GB Laptop / tablet 26B A4B 25.2B 3.8B MoE ~16GB Consumer GPU 31B 30.7B all Dense ~18GB Workstation The E2B result is the most surprising: multiple community benchmarks confirm it outperforms Gemma 3 27B on s

Dev.to AI

5mabout 2 hours ago

ModelsFresh

Hermes agent might be the best open source agent for local models right now

been running hermes agent by nous research for a bit now and the local model support is genuinely better than anything else ive tried in this space the thing that sold me: it has per-model tool call parsers built in, so it actually handles tool calling properly on 30B class models where openclaw and most other frameworks just fall apart. multiple people on here have confirmed its way less token hungry too the self improving skills thing is real but honcho (the learning engine) is off by default which confused me for like 2 days before I figured it out.. once you enable it in config.yaml the difference is noticeable within a few sessions some stuff worth knowing: one command install, handles python and node and everything. supports ollama, vllm, sglang out of the box. six terminal backends

Reddit r/LocalLLaMA

2mabout 4 hours ago

ModelsFresh

Intel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 Super

Good day everyone! You may remember me from such posts as Getting An Intel Arc B70 Running For LLM Inference on a Dell Poweredge R730XD . Maybe not. Probably not... Anyway, I've had this card for about a week now, I ordered it on launch day and have been beating my head against a wall with drivers and other issues until finally getting it running properly! Since then, I've realized there's a significant lack of people actually testing this card and getting some real benchmarks out into the community. Something something be the change you want to see in the world, something something... So I've done some testing, and this certainly won't be the last of my tests and benchmarks, but it'll certainly be the first. I know what is on the community's mind. I hear you ask "How does the new Intel ca

Reddit r/LocalLLaMA

16mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 241 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!