b8661

llama.cpp Releasesby github-actions[bot]April 4, 20261 min read0 views

llama: add custom newline split for Gemma 4 ( #21406 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8661

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llama

CountriesFresh

Así se trabaja ya en España para impulsar el transporte autónomo

Para llegar al campus de la Universidad de Vigo (Uvigo) se necesita echar mano del coche o del transporte público. Solo unos pocos de sus centros se sitúan en el centro de la ciudad: la mayoría están en la ciudad universitaria que se levantó en los 90 en lo que hasta entonces eran montes. La dependencia de medios de transporte es, por tanto, incuestionable. Quizás por eso también este es un espacio con potencial para probar nuevos modelos de vehículos. Desde el inicio del año, el campus de Vigo está testeando cómo es la vida con un bus autónomo. El vehículo mueve al estudiantado (y a quien quiera probarlo) con una línea que recorre el campus, una ciudad universitaria que “reúne las condiciones ideales: un entorno complejo, con usuarios diversos y necesidades reales de transporte de última

CIO Magazine

11mabout 3 hours ago

ModelsFresh

Best model for 4090 as AI Coding Agent

Good day. I am looking for best local model for coding agent. I might've missed something or some model which is not that widely used so I cam here for the help. Currently I have following models I found useful in agentic coding via Google's turbo quant applied on llama.cpp: GLM 4.7 Flash Q4_K_M -> 30B 30B Nemotron 3 Q4_K_M -> 30B Qwen3 Coder Next Q4_K_M -> 80B I really was trying to get Qwen3 Coder Next to get a decent t/s for input and output as I thought it would be a killer but to my surprise...it sometimes makes so silly mistakes that I have to do lots of babysitting for agentic flow. GLM 4.7 and Nemotron are the ones I really can't decide between, both have decent t/s for agentic coding and I use both to maxed context window. The thing is that I feel there might be some model that ju

Reddit r/LocalLLaMA

1mabout 4 hours ago

Open Source AIFresh

Running a local LLM on Android with Termux and llama.cpp

What I used Samsung S21 Ultra Termux llama-cpp-cli llama-cpp-server Qwen3.5-0.8B with Q5_K_M quantization from huggingface (I also tried Bonsai-8B-GGUF-1bit from huggingface. Although this is a newer model and required a different setup, which I might write about at a later time, it produced 2-3 TPS and I did not find that to be usable) Installation I downloaded the "Termux" app from the Google Play store and installed the needed tools in Termux: pkg update pkg upgrade -y pkg install llama-cpp -y Downloading a model I downloaded Qwen3.5-0.8B-Q5_K_M.gguf in my phone browser and saved it to my device. Then I opened the download folder shortcut in the browser, selected the GGUF file -> open with: Termux Now the file is accessible in Termux. Running it in the terminal After that, I loaded the

Reddit r/LocalLLaMA

2mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 223 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

MCP maintainers from Anthropic, AWS, Microsoft, and OpenAI lay out enterprise security roadmap at Dev Summit

In a roundtable panel at the MCP Dev Summit last week in New York, Model Context Protocol (MCP) maintainers from The post MCP maintainers from Anthropic, AWS, Microsoft, and OpenAI lay out enterprise security roadmap at Dev Summit appeared first on The New Stack .

The New Stack

1mabout 2 hours ago

ModelsFresh

Why harness engineering is becoming the new AI moat

The recent leak of Anthropic's Claude Code reveals a hard truth: as LLMs become commoditized, the sophisticated engineering harness built around them is becoming the real moat. The post Why harness engineering is becoming the new AI moat first appeared on TechTalks .

TechTalks

1mabout 3 hours ago

ModelsLive

Microsoft Just Launched 3 AI Models - And They're Coming For OpenAI

AI YouTube Channel 9

1mabout 2 hours ago

ModelsLive

I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails

I didn't see much benefit for Google's AI - until now. Here are my favorite ways to use the new Gemini integration in my car.

ZDNet AI

1m33 minutes ago