Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices
In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is
Could not retrieve the full article text.
Read on discuss.huggingface.co →discuss.huggingface.co
https://discuss.huggingface.co/t/scaling-agentic-memory-to-5-billion-vectors-via-binary-quantization-and-dynamic-wavelet-matrices/174951Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbillionstudy
Why I Run 22 Docker Services at Home
Somewhere in my living room, a 2018 gaming PC is running 22 Docker containers, processing 15,000 emails through a local LLM, and managing the finances of a real business. It was never supposed to do any of this. I run a one-person software consultancy in the Netherlands; web development, 3D printing, and consulting. Last year, I started building an AI system to help me manage it all. Eight specialized agents handling email triage, financial tracking, infrastructure monitoring, and scheduling. Every piece of inference runs locally. No cloud APIs touching my private data. This post covers the hardware, what it actually costs, and what I'd do differently if I started over. The Setup: Three Machines, One Mesh Network The entire system runs on three machines connected via Tailscale mesh VPN: do
![How to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-neural-network-P6fqXULWLNUwjuxqUZnB3T.webp)
How to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]
You want ChatGPT on your website. Maybe for customer support. Maybe to answer FAQs automatically. Or maybe you're running live events and need AI to handle the flood of questions pouring into your chat room. Learning how to embed ChatGPT in your website is simpler than you think - but there's more to consider than most guides tell you. Here's the thing: most guides only cover half the picture. They show you how to add a basic AI chatbot widget. But what happens when 5,000 people hit your site during a product launch? What about moderating AI responses before your chatbot tells a customer something embarrassingly wrong? And what if you need AI assistance in a group chat, not just a 1-to-1 support conversation? To embed ChatGPT in your website, you have two main approaches: use a no-code pla
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Research across 1,372 participants and 9K+ trials details "cognitive surrender", where most subjects had minimal AI skepticism and accepted faulty AI reasoning (Kyle Orland/Ars Technica)
Kyle Orland / Ars Technica : Research across 1,372 participants and 9K+ trials details cognitive surrender , where most subjects had minimal AI skepticism and accepted faulty AI reasoning When it comes to large language model-powered tools, there are generally two broad categories of users.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!