Gemma 4 released
Blog: https://deepmind.google/models/gemma/ Models: - Gemma4-2B: https://huggingface.co/google/gemma-4-E2B-it - Gemma4-4B: https://huggingface.co/google/gemma-4-E4B-it - Gemma4-26B-A4B: https://huggingface.co/google/gemma-4-26B-A4B-it - Gemma4-31B: https://huggingface.co/google/gemma-4-31B-it The GGUF versions can be found here: https://huggingface.co/collections/unsloth/gemma-4 https://preview.redd.it/j7c0107ewssg1.png?width=1552 format=png auto=webp s=1c47b1d9986c42a6cb1f81d73c142863586b1fd6 submitted by /u/garg-aayush [link] [comments]
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1salijj/gemma_4_released/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelreleaseversion
How to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn Workflows
In this tutorial, we explore the full capabilities of Z.AI’s GLM-5 model and build a complete understanding of how to use it for real-world, agentic applications. We start from the fundamentals by setting up the environment using the Z.AI SDK and its OpenAI-compatible interface, and then progressively move on to advanced features such as streaming [ ] The post How to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn Workflows appeared first on MarkTechPost .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Open Source AI
v4.3.3 - Gemma 4 support!
Changes Gemma 4 support with tool-calling in the API and UI. 🆕 - v4.3.1. ik_llama.cpp support : Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference. API: Add echo + logprobs for /v1/completions . The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field. Further optimize my custom gradio fork, saving up to 5

B70: Quick and Early Benchmarks & Backend Comparison
llama.cpp: f1f793ad0 (8657) This is a quick attempt to just get it up and running. Lots of oneapi runtime still using "stable" from Intels repo. Kernel 6.19.8+deb13-amd64 with an updated xe firmware built. Vulkan is Debian but using latest Mesa compiled from source. Openvino is 2026.0. Feels like everything is "barely on the brink of working" (which is to be expected). sycl: $ build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -p 512,16384 -n 128,512 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp512 | 798.07 ± 2.72 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp16384



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!