Gemma 4 has been released
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF https://huggingface.co/unsloth/gemma-4-31B-it-GGUF https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF https://huggingface.co/collections/google/gemma-4 What’s new in Gemma 4 https://www.youtube.com/watch?v=jZVBoFOJK-Q Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-s
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarkrelease
Beijing mandates internal AI ethics reviews to ensure ‘controllable’ tech
Chinese companies engaging in artificial intelligence activities are required to set up internal “AI ethics review committees” under new rules released by Beijing on Thursday, effective immediately. The notice comes as policymakers look to ensure that fast-paced AI progress can continue in a “healthy” manner amid growing consumer and enterprise adoption. Jointly released by 10 government bodies and institutions including the Ministry of Industry and Information Technology, National Development...
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Open Source AI
v4.3.2
Changes Gemma 4 support with full tool-calling in the API and UI. 🆕 ik_llama.cpp support : Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference. API: Add echo + logprobs for /v1/completions . The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field. Further optimize my custom gradio fork, saving up to 50 ms

How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide
How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide Want to run powerful AI agents without the endless API bills of cloud services? The good news is you don’t need a data‑center‑grade workstation. A single modern consumer GPU is enough to host capable 9B‑parameter models like qwen3.5:9b, giving you private, low‑latency inference at a fraction of the cost. This article walks you through the exact hardware specs, VRAM needs, software installation steps, and budget‑friendly upgrade paths so you can get a local agent up and running today—no PhD required. Why a Consumer GPU Is Enough It’s a common myth that you must buy a professional‑grade card (think RTX A6000 or multiple GPUs linked via NVLink) to run LLMs locally. In reality, for 9B‑class models the sweet spot lies in t

Show HN: The Comments Owl for HN browser extension now hides obvious "AI" items
If you want to give yourself a break from the flood of "AI" items on Hacker News until/unless you feel like reading them, the Comments Owl for Hacker News browser extension now adds a handy toggle to your right-click context menu on the main item list pages (or the extension popup, for mobile browsers) which filters out the most obvious "AI" items by title and site, using (editable) regular expressions which have been tested on the contents of these pages over the last week or so. The extension's primary functionality is to make it easier to follow comment threads across repeat visits, and catch up with recent comments, but it also offers other UI + UX tweaks, such as muting and noting users, and tweaks to the UI on mobile. Release notes and screenshots for new functionality: https://githu




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!