trunk/f997a74770624c85b137dc248b5e0f5817bda429: Use full path to GHA in FA3 stable workflow (#179187)
Pull Request resolved: #179187 Approved by: https://github.com/huydhn ghstack dependencies: #179183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/179187 Approved by: https://github.com/huydhn ghstack dependencies: #179183Pull Request resolved: https://github.com/pytorch/pytorch/pull/179187 Approved by: https://github.com/huydhn ghstack dependencies: #179183Assets 2
PyTorch Releases
https://github.com/pytorch/pytorch/releases/tag/trunk%2Ff997a74770624c85b137dc248b5e0f5817bda429Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
github
TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL
I’ve been experimenting with TurboQuant KV cache quantization in llama.cpp (CPU + Metal) on Gemma 4 26B A4B-it Q4_K_M on an Apple M4 Pro 48GB, and the results look surprisingly strong. Gemma 4 findings On Gemma 4, QJL seems to work well, and FWHT as a structured rotation substitute also looks like a good fit for the large attention heads (dk=256/512). My benchmark results: tq3j/q4_0: 37/37 on quality tests, 8/8 on NIAH tq2j/q4_0: 36/37, with the only miss being an empty response +34% faster than q4_0/q4_0 at 131K context TurboQuant overtakes q4_0 from 4K context onward So on this setup, ~3.1 bits per K channel gets near-zero accuracy loss with a meaningful long-context speedup. What’s also interesting is that this looks better than the public Gemma 4 fork results I’ve seen so far. In the l

Your LLM Passes Type Checks but Fails the "Vibe Check": How I Fixed AI Reliability
Your LLM Passes Type Checks but Fails the "Vibe Check": How I Fixed AI Reliability You validate your LLM outputs with Pydantic. The JSON is well-formed. The fields are correct. Life is good. Then your model returns a "polite decline" that says "I'd rather gouge my eyes out." It passes your type checks. It fails the vibe check. This is the Semantic Gap — the space between structural correctness and actual meaning . Every team shipping LLM-powered features hits it eventually. I got tired of hitting it, so I built Semantix . The Semantic Gap: Shape vs. Meaning Here's what most validation looks like today: class Response ( BaseModel ): message : str tone : Literal [ " polite " , " neutral " , " firm " ] This tells you the shape is right. It tells you nothing about whether the meaning is right.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
b8664
server: Fix undefined timing measurement errors in server context ( #21201 ) Co-authored-by: Dan Hoffman [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!