Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeA top US shipbuilder is exploring how AI and robots can do some of the hardest jobs on the production floorBusiness InsiderAnonymous Sources Detail Sam Altman’s Alleged Untrustworthiness in New ReportGizmodoSamsung Profit Up Eight-Fold After AI Chip Sales Defy War FearsBloomberg TechnologyThe League of Legends KeSPA cup will air globally on Disney+EngadgetHow Creators Use Instagram DM Automation to Scale Faster (2026 Guide)Dev.to AIPress Releases vs RSS vs AI Feeds: Why Structured Government Data MattersDev.to AIWhy Smart Creators Are Automating Instagram DMs in 2026Dev.to AIЯ продал AI-услуги на 500к. Вот что реально убедило клиентовDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIThe Gardenlesswrong.comBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeA top US shipbuilder is exploring how AI and robots can do some of the hardest jobs on the production floorBusiness InsiderAnonymous Sources Detail Sam Altman’s Alleged Untrustworthiness in New ReportGizmodoSamsung Profit Up Eight-Fold After AI Chip Sales Defy War FearsBloomberg TechnologyThe League of Legends KeSPA cup will air globally on Disney+EngadgetHow Creators Use Instagram DM Automation to Scale Faster (2026 Guide)Dev.to AIPress Releases vs RSS vs AI Feeds: Why Structured Government Data MattersDev.to AIWhy Smart Creators Are Automating Instagram DMs in 2026Dev.to AIЯ продал AI-услуги на 500к. Вот что реально убедило клиентовDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIThe Gardenlesswrong.com
AI NEWS HUBbyEIGENVECTOREigenvector

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

arXiv cs.LGby Mohammed Suhail B NadafApril 6, 20262 min read0 views
Source Quiz

arXiv:2604.02608v1 Announce Type: new Abstract: Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering ex

View PDF HTML (experimental)

Abstract:Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering exceeds logit lens accuracy for every task on every model, with gaps as large as -0.91. Only 3 of 72 task-model instances show the predicted decodable-without-steerable pattern, all in Mistral. FV vocabulary projection reveals that FVs achieving over 0.90 steering accuracy still project to incoherent token distributions, indicating FVs encode computational instructions rather than answer directions. FVs intervene optimally at early layers (L2-L8); the logit lens detects correct answers only at late layers (L28-L32). The previously reported negative cosine-transfer correlation (r=-0.572) dissolves at scale: pooled r ranges from -0.199 to +0.126, and cosine adds less than 0.011 in R-squared beyond task identity. Post-steering analysis reveals a model-family divergence: Mistral FVs rewrite intermediate representations; Llama/Gemma FVs produce near-zero changes despite successful steering. Activation patching confirms causal localization: easy tasks achieve perfect recovery at targeted layers; hard tasks show zero recovery everywhere.

Comments: 30 pages, 7 figures

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2604.02608 [cs.LG]

(or arXiv:2604.02608v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.02608

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mohammed Suhail B Nadaf [view email] [v1] Fri, 3 Apr 2026 00:54:11 UTC (4,226 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Steerable b…llamamistralmodellanguage mo…announceanalysisarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 204 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!