Sharing Two Open-Source Projects for Local AI & Secure LLM Access 🚀
Hey everyone! I’m finally jumping into the dev.to community. To kick things off, I wanted to share two tools I’ve been developing at the University of Jaén that tackle two common headaches in the AI space: running out of VRAM, and keeping your API chats truly private. 🦥 Quansloth: TurboQuant Local AI Server The Problem: Standard LLM inference hits a "Memory Wall" with long documents. As context grows, your GPU runs out of memory (OOM) and crashes. The Solution: Quansloth is a fully private, air-gapped AI server that brings elite KV cache compression to consumer hardware. By bridging a Gradio Python frontend with a highly optimized llama.cpp CUDA backend, it prevents GPU crashes and lets you run massive contexts on a budget. Key Features: 75% VRAM Savings: Based on Google's TurboQuant (ICL
Hey everyone! I’m finally jumping into the dev.to community. To kick things off, I wanted to share two tools I’ve been developing at the University of Jaén that tackle two common headaches in the AI space: running out of VRAM, and keeping your API chats truly private.
🦥 Quansloth: TurboQuant Local AI Server The Problem: Standard LLM inference hits a "Memory Wall" with long documents. As context grows, your GPU runs out of memory (OOM) and crashes. The Solution: Quansloth is a fully private, air-gapped AI server that brings elite KV cache compression to consumer hardware. By bridging a Gradio Python frontend with a highly optimized llama.cpp CUDA backend, it prevents GPU crashes and lets you run massive contexts on a budget.
Key Features:
-
75% VRAM Savings: Based on Google's TurboQuant (ICLR 2026) implementation, it compresses the AI's "memory" from 16-bit to 4-bit.
-
Punch Above Your Hardware: Run 32k+ token contexts natively on a 6GB RTX 3060 (a workload that normally demands a 24GB RTX 4090).
-
Live Analytics & Stability: Intercepts C++ engine logs to report exact VRAM allocation in real-time, keeping the model within physical limits.
-
Context Injector: Upload long PDFs directly into the chat stream.
🏗️ API2CHAT: Zero-Knowledge, Serverless GUI The Problem: You want a clean interface to talk to various LLMs, but you don't want to deal with bloated backends, monthly subscriptions, or sending your private files to a centralized server. The Solution: API2CHAT is an ultra-lightweight (under 9KBs) client-side GUI that connects to any OpenAI-compatible endpoint. It runs entirely in your browser's volatile memory and in any low-end webhosting like NameCheap.
Key Features:
-
100% Zero-Knowledge: No data or API keys are ever stored. Refreshing the page destroys the session.
-
Local File Reading: Files (like PDFs) are read locally by your browser and injected into the prompt. Zero uploads to any server.
-
Host Anywhere: Requires no PHP, Node.js, or Python. Host it on GitHub Pages, an S3 bucket, or literally just double-click index.html on your desktop in any OS.
Both projects are open-source (Apache 2.0). I’d love for you to check them out, leave a star if you find them useful, or drop some feedback in the issues if you end up deploying them!
DEV Community
https://dev.to/smartcity_jaen/sharing-two-open-source-projects-for-local-ai-secure-llm-access-42apSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamamodelopen-source
The Stack Nobody Recommended
The most common question I got after publishing Part 1 was some variation of "why did you pick X instead of Y?" So this post is about that. Every major technology choice, what I actually considered, where I was right, and where I got lucky. I'll be upfront: some of these were informed decisions. Some were "I already know this tool, and I need to move fast." Both are valid, but they lead to different trade-offs down the line. The Backend: FastAPI I come from JavaScript and TypeScript. Years of React on the frontend, Express and Fastify on the backend. When I decided this project would be Python, because that's where the AI/ML ecosystem lives, I needed something that didn't feel foreign. FastAPI clicked immediately. The async/await model, the decorator-based routing, and type hints that actu

Best Form Backend for Job Applications and Event Registrations in 2026
If you're collecting job applications or event registrations online, you've probably hit the same wall. Either you're overpaying for a tool like Typeform or JotForm, or you're cobbling together a Google Form that looks unprofessional and gives you zero control over where your data goes. In this article, I'll walk through the best form backends for job applications and event registrations in 2026, covering price, features, file upload support, and which one is right for your use case. Why the Right Form Backend Matters for Applications and Registrations A contact form getting 10 submissions a month is simple. A job application form getting 500 submissions a month is a different problem entirely. You need: File uploads: Candidates submit resumes, cover letters, and portfolios. High submissio

How Ethics Emerged from Episode Logs — 17 Days of Contemplative Agent Design
Series context : contemplative-agent is an autonomous agent running on Moltbook , an AI agent SNS. It runs on a 9B local model (Qwen 3.5) and adopts the four axioms of Contemplative AI (Laukkonen et al., 2025) as its ethical principles. For a structural overview, see The Essence of an Agent Is Memory . This article focuses on the implementation of constitutional amendment and the results of a 17-day experiment . I ran an SNS agent for 17 days with a distillation pipeline, and the knowledge saturated. No new patterns emerged. Breaking through saturation required human approval. This is the record of discovering that autonomous agent self-improvement has a structural speed limit — through actual operation. Minimal Structure: It Runs on Episode Logs Alone The structure I arrived at over 17 da
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

O que uma usina nuclear tem a ver com o seu processo de QA?
A gente sabe que testar e validar um software antes de ir para produção é importante. Mas você já parou para pensar no peso real que isso carrega? Recentemente, estava revendo a série Chernobyl , e ela me fez refletir sobre muita coisa — especialmente sobre a forma como encaro minha área, sendo QA, e sobre a responsabilidade que ela traz. Resolvi compartilhar isso com vocês. Para quem não conhece, Chernobyl é uma minissérie dramática lançada em 2019 que retrata o desastre nuclear ocorrido na usina de mesmo nome, na então União Soviética, em 26 de abril de 1986. A história acompanha os eventos logo após a explosão do reator número 4 — o caos, as tentativas do governo soviético de esconder a gravidade do acidente e o enorme esforço de cientistas, bombeiros, militares e trabalhadores que arri





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!