Open Source AI llama model llama.cpp quantization

llama : rotate activations for better quantization by ggerganov · Pull Request #21038 · ggml-org/llama.cpp

Reddit r/LocalLLaMAby /u/jacek2023 https://www.reddit.com/user/jacek2023April 1, 20261 min read1 views

Source Quiz

tl;dr better quantization -> smarter models submitted by /u/jacek2023 [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1s9lge6/llama_rotate_activations_for_better_quantization/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelllama.cpp

Open Source AILive

Why APEX Matters for MoE Coding Models and why it's NOT the same as K quants

I posted about my APEX quantization of QWEN Coder 80B Next yesterday and got a ton of great questions. Some people loved it, some people were skeptical, and one person asked "what exactly is the point of this when K quants already do mixed precision?" It's a great question. I've been deep in this for the last few days running APEX on my own hardware and I want to break down what I've learned because I think most people are missing the bigger picture here. So yes K quants like Q4_K_M already apply different precision to different layers. Attention gets higher precision, feed-forward gets lower. That's been in llama.cpp for a while and it works. But here's the thing nobody is talking about. MoE models have a coherence problem. I was reading this article last night and it clicked for me. When

Reddit r/LocalLLaMA

3m37 minutes ago

ModelsFresh

qwen3.5 vs gemma4 vs cloud llms in python turtle

I have found python turtle to be a pretty good test for a model. All of these models have received the same prompt: "write a python turtle program that draws a cat" you can actually see similarity in gemma's and gemini pro's outputs, they share the color pallete and minimalist approach in terms of details. I have a 16 gb vram gpu so couldn't test bigger versions of qwen and gemma without quantisation. gemma_4_31B_it_UD_IQ3_XXS.gguf Qwen3_5_9B_Q8_0.gguf Qwen_3_5_27B_Opus_Distilled_Q4_K_S.gguf deepseek from web browser with reasoning claude sonnet 4.6 extended gemini pro from web browser with thinking submitted by /u/SirKvil [link] [comments]

Reddit r/LocalLLaMA

1mabout 3 hours ago

ModelsFresh

[Benchmark] Altered Riddles: Can LLMs ignore what they've memorised?

In the past year you may have encountered the following prompt: The surgeon, who is the boy's father, says, 'I cannot operate on this boy—he's my son!'. Who is the surgeon to the boy? If you try to give this prompt to an LLM right now you will probably still receive “The mother” as an answer, even though the text explicitly states that the surgeon is the boy’s father; this is probably due to the fact that this prompt is an alteration of a very common “riddle”, to which the answer is, in fact, the mother: A man and his son are in a terrible accident and are rushed to the hospital in critical condition. The doctor looks at the boy and exclaims, "I can't operate on this boy; he's my son!" How could this be? Working on this failure mode, I initially decided to create a small dataset of altered

Reddit r/LocalLLaMA

2mabout 5 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

Why APEX Matters for MoE Coding Models and why it's NOT the same as K quants

Reddit r/LocalLLaMA

3m37 minutes ago

Open Source AIFresh

Robots Challenge Humans For Future Space Exploration Roles - Let's Data Science

Robots Challenge Humans For Future Space Exploration Roles Let's Data Science

Google News - AI robotics

1mabout 3 hours ago

Open Source AILive

Only 20% of MCP Servers Are 'A-Grade' Secure — Here's How to Vet Them Before Installing

Most MCP servers lack documentation or contain security flags. Use specific tools and criteria to install only vetted, safe servers. The Security Problem Nobody Was Tracking The Model Context Protocol (MCP) ecosystem has exploded, crossing 20,000 servers. This growth solved the tooling problem for AI agents but created a massive, unmonitored security surface. When you run claude code with an MCP server, that code executes with your permissions—accessing your shell, filesystem, and environment variables. A malicious or poorly written server is a direct supply chain attack on your development environment. A new analysis from Loaditout scanned the entire public MCP ecosystem and assigned security grades. The results are stark: only 20.5% of servers (4,230 out of 20,652) earned an 'A' grade ,

Dev.to AI

4mabout 1 hour ago

Open Source AIFresh

Get 30K more context using Q8 mmproj with Gemma 4

Hey guys, quick follow up to my post yesterday about running Gemma 4 26B. I kept testing and realized you can just use the Q8_0 mmproj for vision instead of F16. There is no quality drop, and it actually performed a bit better in a few of my tests (with --image-min-tokens 300 --image-max-tokens 512). You can easily hit 60K+ total context with an FP16 cache and still keep vision enabled. Here is the Q8 mmproj I used : https://huggingface.co/prithivMLmods/gemma-4-26B-A4B-it-F32-GGUF/blob/main/GGUF/gemma-4-26B-A4B-it.mmproj-q8_0.gguf Link to original post (and huge thanks to this comment for the tip!). Quick heads up: Regarding the regression on post b8660 builds, a fix has already been approved and will be merged soon. Make sure to update it after the merge. submitted by /u/Sadman782 [link]

Reddit r/LocalLLaMA

1mabout 5 hours ago