Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.
I've been experimenting with Bonsai 8B — PrismML's true 1-bit model (every weight is literally 0 or 1, not ternary like BitNet). I realized that since weights are bits, the diff between two model behaviors is just a XOR mask. So I built a tool that searches for sparse XOR patches that modify model behavior. The basic idea: flip a row of weights, check if the model got better at the target task without breaking anything else, keep or revert. The set of accepted flips is the patch. What it does on held-out prompts the search never saw: Without patch: d/dx [x^7 + x] = 0 ✗ With patch: d/dx [x^7 + x] = 7x^6 + 1 ✓ Without patch: Is 113 prime? No, 113 is not prime ✗ With patch: Is 113 prime? Yes, 113 is a prime number ✓ 93 row flips. 0.007% of weights. ~1 KB. Zero inference overhead — the patched
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1sak9f6/bankai_卍解_the_first_posttraining_adaptation/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingperplexity
Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize - VentureBeat
Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize VentureBeat

Google strongly implies the existence of large Gemma 4 models
In the huggingface card: Increased Context Window – The small models feature a 128K context window, while the medium models support 256K. Small and medium... implying at least one large model! 124B confirmed :P submitted by /u/coder543 [link] [comments]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize - VentureBeat
Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize VentureBeat

Google strongly implies the existence of large Gemma 4 models
In the huggingface card: Increased Context Window – The small models feature a 128K context window, while the medium models support 256K. Small and medium... implying at least one large model! 124B confirmed :P submitted by /u/coder543 [link] [comments]



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!