v0.16.1

Axolotl Releasesby axolotl-ai-cloudApril 2, 20261 min read1 views

Axolotl v0.16.1 Release Notes Gemma 4 Support Example YAML: https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/gemma4/26b-a4b-moe-qlora.yaml gemma4 support by @winglian in #3574 Full Changelog : v0.16.0...v0.16.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

Axolotl Releases

https://github.com/axolotl-ai-cloud/axolotl/releases/tag/v0.16.1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

releasegithubaxolotl

ModelsFresh

ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models

arXiv:2604.02577v1 Announce Type: new Abstract: We introduce ROMAN (ROuting Multiscale representAtioN), a deterministic operator for time series that maps temporal scale and coarse temporal position into an explicit channel structure while reducing sequence length. ROMAN builds an anti-aliased multiscale pyramid, extracts fixed-length windows from each scale, and stacks them as pseudochannels, yielding a compact representation on which standard convolutional classifiers can operate. In this way, ROMAN provides a simple mechanism to control the inductive bias of downstream models: it can reduce temporal invariance, make temporal pooling implicitly coarse-position-aware, and expose multiscale interactions through channel mixing, while often improving computational efficiency by shortening th

arXiv cs.LG

2mabout 3 hours ago

ModelsFresh

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

arXiv:2604.02694v1 Announce Type: new Abstract: The rapid progress of generative AI has enabled increasingly realistic text-centric image forgeries, posing major challenges to document safety. Existing forensic methods mainly rely on visual cues and lack evidence-based reasoning to reveal subtle text manipulations. Detection, localization, and explanation are often treated as isolated tasks, limiting reliability and interpretability. To tackle these challenges, we propose DocShield, the first unified framework formulating text-centric forgery analysis as a visual-logical co-reasoning problem. At its core, a novel Cross-Cues-aware Chain of Thought (CCT) mechanism enables implicit agentic reasoning, iteratively cross-validating visual anomalies with textual semantics to produce consistent, e

arXiv cs.CV

1mabout 3 hours ago

ReleasesFresh

Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents

arXiv:2604.03173v1 Announce Type: new Abstract: Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using 10 models and agents on DRBench (53,090 URLs) and 3 models on ExpertQA (168,021 URLs across 32 academic fields). We find that 3--13\% of citation URLs are hallucinated -- they have no record in the Wayback Machine and likely never existed -- while 5--18\% are non-resolving overall. Deep research agents generate substantially more citations per query than search-augmented LLMs but hallucinate URLs at higher rates. Domain effects are pronounced: non-resolving rates range from 5.4\% (Business) to 11.4\% (Theolog

arXiv cs.CL

2mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 257 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started

OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo — sign up, connect your chat apps, and your agent is running in about a minute. No Docker, no YAML, no server babysitting. People are using it for personalized morning briefs, inbox digests, auto-building CRMs, browser automation, GitHub triage, and more. Hosting is $8/month with a 7-day free trial, inference runs through Kilo Gateway at zero markup across 500+ models, and it's free for open-source maintainers. Read All

Hackernoon AI

1mabout 1 hour ago

Open Source AIFresh

WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models

arXiv:2604.02570v1 Announce Type: new Abstract: Singular Value Decomposition (SVD) has become an important technique for reducing the computational burden of Vision Language Models (VLMs), which play a central role in tasks such as image captioning and visual question answering. Although multiple prior works have proposed efficient SVD variants to enable low-rank operations, we find that in practice it remains difficult to achieve substantial latency reduction during model execution. To address this limitation, we introduce a new computational pattern and apply SVD at a finer granularity, enabling real and measurable improvements in execution latency. Furthermore, recognizing that weight elements differ in their relative importance, we adaptively allocate relative importance to each elemen

arXiv cs.CV

1mabout 3 hours ago

Open Source AIFresh

trunk/bac8607b42eebcd1173c3c8b6a6afa62ccb4c3b8: [vllm hash update] update the pinned vllm hash (#179439)

This PR is auto-generated nightly by this action . Update the pinned vllm hash. Pull Request resolved: #179439 Approved by: https://github.com/pytorchbot

PyTorch Releases

1mabout 2 hours ago

Open Source AIFresh

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

I spent Saturday night testing n-gram speculative decoding on consumer GPUs. The claim: speculative decoding can speed up LLM inference by 2-3x by predicting future tokens and verifying them in parallel. I wanted to see if that holds up on real hardware running diverse workloads. For the most part, it doesn't. But the journey was worth it, and I caught a benchmarking pitfall that I think a lot of people are falling into. The setup My home lab runs Kubernetes on a machine called Shadowstack. Two NVIDIA RTX 5060 Ti GPUs (16GB VRAM each, 32GB total). I use LLMKube, an open source K8s operator I built, to manage LLM inference workloads with llama.cpp. For this test I deployed two models: Gemma 4 26B-A4B : Google's Mixture of Experts model. 26B total params but only ~4B active per token. Runs a

DEV Community

7mabout 3 hours ago