Open Source AI llama release github llama.cpp

b8606

llama.cpp Releasesby ggml-orgApril 1, 20261 min read2 views

<details open=""> <p>ggml-webgpu: port all AOT operators to JIT (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4097310957" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/20728" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/20728/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/20728">#20728</a>)</p> <ul> <li>port cpy pipeline to shader lib with JIT compilation</li> <li>port glu pipeline to shader lib with JIT compilation</li> <li>port rope pipeline to shader lib with JIT compilation</li> <li>port soft_max pipeline to shader lib with JIT compilation</li> <li>removed unused functions from embed_wgsl.py which were used for<br> old AOT template expansion</li> </ul>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8606

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamareleasegithub

Self-Evolving AILive

Show HN: Agentdid – Cryptographic proof that a human stands behind an AI agent

Article URL: https://github.com/Mr-Perfection/agentdid Comments URL: https://news.ycombinator.com/item?id=47623692 Points: 1 # Comments: 0

Hacker News AI Top

1mabout 1 hour ago

Self-Evolving AILive

Autonomous, task-aware context tuning for AI coding agents

Article URL: https://github.com/juyterman1000/entroly/ Comments URL: https://news.ycombinator.com/item?id=47623698 Points: 1 # Comments: 0

Hacker News AI Top

1mabout 1 hour ago

ProductsLive

Full-Stack E-Commerce App - Part 1: Project setup

Hey! Welcome to Part 1 of this series, where we build a complete, production-ready e-commerce app called ShopFlow — from an empty folder all the way to a live site on AWS. By the end of this series, ShopFlow will have: User authentication with JWT tokens A product catalogue with search powered by Elasticsearch A shopping cart (stored in Redis) and a full order system AI features — smart search, a chatbot, and product descriptions generated by AI Real payments via Stripe and PayPal Event-driven order processing with Apache Kafka Deployed on AWS with Kubernetes and a CI/CD pipeline That sounds like a lot — and it is! But we are going to build it one piece at a time . Each part of this series focuses on one thing, explains why we are doing it, and by the end, you have working code. In this fi

DEV Community

10m34 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AIFresh

Exploratory RL Yields Closed-Form Trading Policies - Let's Data Science

Exploratory RL Yields Closed-Form Trading Policies Let's Data Science

GNews AI reinforcement learning

1mabout 3 hours ago

Open Source AILive

v4.3.1

Changes Gemma 4 support with full tool-calling in the API and UI. 🆕 ik_llama.cpp support : Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference. API: Add echo + logprobs for /v1/completions . The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field. Further optimize my custom gradio fork, saving up to 50 ms

text-gen-webui Releases

3mabout 2 hours ago

Open Source AIFresh

From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents

arXiv:2604.01496v1 Announce Type: new Abstract: We introduce SWE-ZERO to SWE-HERO, a two-stage SFT recipe that achieves state-of-the-art results on SWE-bench by distilling open-weight frontier LLMs. Our pipeline replaces resource-heavy dependencies with an evolutionary refinement strategy: (1) SWE-ZERO utilizes large-scale, execution-free trajectories to master code semantics and repository-level reasoning, and (2) SWE-HERO applies targeted, execution-backed refinement to transition these semantic intuitions into rigorous engineering workflows. Our empirical results set a new benchmark for open-source models of comparable size. We release a dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories distilled from Qwen3-Coder-480B, alongside a suite of agents based on the Qwen2.5-Coder series.

arXiv cs.SE

1mabout 3 hours ago

Open Source AIFresh

A Quick Note on Gemma 4 Image Settings in Llama.cpp

In my last post, I mentioned using --image-min-tokens to increase the quality of image responses from Qwen3.5 . I went to load Gemma 4 the same way, and hit an error: [58175] srv process_chun: processing image... [58175] encoding image slice... [58175] image slice encoded in 7490 ms [58175] decoding image batch 1/2, n_tokens_batch = 2048 [58175] /Users/socg/llama.cpp-b8639/src/llama-context.cpp:1597: GGML_ASSERT((cparams.causal_attn || cparams.n_ubatch > = n_tokens_all ) "non-causal attention requires n_ubatch >= n_tokens" ) failed [58175] WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. [58175] WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. [58175] See: https://github.com/ggml-org/llama.cpp/pull/17869 [58175] 0 libggml-base.0.9.11.dylib 0

DEV Community

3mabout 5 hours ago