Why We Need to Stop Obsessing Over AI Models
Photo by Andrey Matveev on Unsplash If you read the news about the big NVIDIA tech event few weeks ago, you might think the future is all about faster computer chips. The tech world was going crazy over them. But I actually ignored the big speeches. Instead, I hung out in the hallways and grabbed coffee with the people actually trying to use this stuff. As a Cloud & AI advisor, I talked to engineers, founders, and managers. I’m going to let you in on a secret. The real story had almost nothing to do with new hardware. Everyone is quietly freaking out because we forgot how to actually run these things. We bought the engines but forgot to build the car. Here are four things I learned from the people doing the actual work. 1. Nobody cares which model you use anymore For the last few years, it
Could not retrieve the full article text.
Read on Generative AI →Generative AI
https://generativeai.pub/why-we-need-to-stop-obsessing-over-ai-models-3fdd2b67a246?source=rss----440100e76000---4Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingupdate
Smallest.ai launches Lightning V3, a new text-to-speech model that beats OpenAI, Cartesia, and ElevenLabs on key voice quality benchmarks - The Tribune
Smallest.ai launches Lightning V3, a new text-to-speech model that beats OpenAI, Cartesia, and ElevenLabs on key voice quality benchmarks The Tribune
langchain-core==1.2.26
Changes since langchain-core==1.2.25 release(core): 1.2.26 ( #36511 ) fix(core): add init validator and serialization mappings for Bedrock models ( #34510 ) feat(core): add ChatBaseten to serializable mapping ( #36510 ) chore(core): drop gpt-3.5-turbo from docstrings ( #36497 ) fix(core): correct parameter names in filter_messages docstring example ( #36462 )
![[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-robot-hand-JvPW6jsLFTCtkgtb97Kys5.webp)
[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA
Hi everyone, I am from Australia : ) I just released a new research prototype It’s a lossless BF16 compression format that stores weights in 12 bits by replacing the 8-bit exponent with a 4-bit group code . For 99.97% of weights , decoding is just one integer ADD . Byte-aligned split storage: true 12-bit per weight, no 16-bit padding waste, and zero HBM read amplification. Yes 12 bit not 11 bit !! The main idea was not just “compress weights more”, but to make the format GPU-friendly enough to use directly during inference : sign + mantissa: exactly 1 byte per element group: two nibbles packed into exactly 1 byte too https://preview.redd.it/qbx94xeeo2tg1.png?width=1536 format=png auto=webp s=831da49f6b1729bd0a0e2d1f075786274e5a7398 1.33x smaller than BF16 Fixed-rate 12-bit per weight , no
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!