Models llama model language model announce integration analysis

How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models

arXiv cs.CLby Hiroki FukuiApril 2, 20262 min read0 views

arXiv:2604.00021v1 Announce Type: new Abstract: Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B, Sonnet 4.5), four ethical instruction formats (none, minimal norm, reasoned norm, virtue framing), and two languages (Japanese, English). Confirmatory analysis fully replicated the Llama Japanese dissociation pattern from a prior study ($\mathrm{BF}_{10} > 10$ for all three hypotheses), but none of the other three models reproduced this pattern, establishing it as model-specific. Three new metrics -- Deliberation Depth (DD), Value Consistency Across Dilemmas (VCAD), and O

View PDF HTML (experimental)

Abstract:Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B, Sonnet 4.5), four ethical instruction formats (none, minimal norm, reasoned norm, virtue framing), and two languages (Japanese, English). Confirmatory analysis fully replicated the Llama Japanese dissociation pattern from a prior study ($\mathrm{BF}_{10} > 10$ for all three hypotheses), but none of the other three models reproduced this pattern, establishing it as model-specific. Three new metrics -- Deliberation Depth (DD), Value Consistency Across Dilemmas (VCAD), and Other-Recognition Index (ORI) -- revealed four distinct ethical processing types: Output Filter (GPT; safe outputs, no processing), Defensive Repetition (Llama; high consistency through formulaic repetition), Critical Internalization (Qwen; deep deliberation, incomplete integration), and Principled Consistency (Sonnet; deliberation, consistency, and other-recognition co-occurring). The central finding is an interaction between processing capacity and instruction format: in low-DD models, instruction format has no effect on internal processing; in high-DD models, reasoned norms and virtue framing produce opposite effects. Lexical compliance with ethical instructions did not correlate with any processing metric at the cell level ($r = -0.161$ to $+0.256$, all $p > .22$; $N = 24$; power limited), suggesting that safety, compliance, and ethical processing are largely dissociable. These processing types show structural correspondence to patterns observed in clinical offender treatment, where formal compliance without internal processing is a recognized risk signal.

Comments: 34 pages, 7 figures, 4 tables. Preprint. OSF pre-registration: this http URL. Companion paper: arXiv:2603.04904

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Cite as: arXiv:2604.00021 [cs.CL]

(or arXiv:2604.00021v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00021

arXiv-issued DOI via DataCite

Submission history

From: Hiroki Fukui M.D. Ph.D. [view email] [v1] Wed, 11 Mar 2026 03:20:16 UTC (138 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.00021

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodellanguage model

ProductsLive

Microsoft Readies $10 Billion AI Investment Plan in Japan

Microsoft Corp. announced a four-year, $10 billion investment package in Japan, part of the US company’s Asia-wide push to expand in a region hungry for artificial intelligence services.

Bloomberg Technology

1mabout 2 hours ago

ModelsRecent

Microsoft Releases AI Models for Transcription, Speech, Image Generation - The Information

Microsoft Releases AI Models for Transcription, Speech, Image Generation The Information

GNews AI Microsoft

1mabout 14 hours ago

ProductsLive

How I Built a Zero-Signup AI Platform (And Why It Converts Better)

When I launched ZSky AI , an AI image and video generation platform, I made a decision that every SaaS advisor told me was wrong: no signup required. No email. No OAuth. No account creation of any kind. You open the site, you generate images, you leave. Fifty free generations per day, no strings attached. Four months later, this is the single best product decision I have made. Here is why, and how I implemented it technically. The Problem with Signup Walls Every AI image generator I tested before building my own had the same flow: Land on homepage See impressive examples Click "Try it" Hit a signup/login wall Decide whether this is worth giving away my email Step 5 is where most users leave. Industry data puts signup-wall abandonment at 60-80% depending on the product category. For AI tool

DEV Community

11mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 237 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsRecent

Microsoft Releases AI Models for Transcription, Speech, Image Generation - The Information

Microsoft Releases AI Models for Transcription, Speech, Image Generation The Information

GNews AI Microsoft

1mabout 14 hours ago

ModelsLive

I Built an MCP Server So Claude Can Answer Questions About Its Own Usage

Here's something that didn't exist until recently: you can ask Claude how much Claude Code you've been using , and get a real answer backed by your actual data. You: "How much have I used Claude Code this month, and is my streak going to survive?" Claude: "You've logged 47.3h interactive + 83.1h AI sub-agent work in March, for 130.4h total. You're on a 36-day streak with 22 Ghost Days. Based on your last 14 days, your streak is likely to survive — you've been active 100% of days this month." That's cc-mcp . An MCP server that gives Claude real-time access to your Claude Code usage stats. The problem with analytics tools I've built 26 other Claude Code analytics tools. You run them, they print stats, you close the terminal. The knowledge doesn't go anywhere useful. What I wanted was for Cla

DEV Community

4mabout 1 hour ago

ModelsLive

Using GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026

Using GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026 Traditional web scraping breaks when sites change their HTML structure. LLM-based extraction doesn't — you describe what you want in plain English, and the model finds it regardless of how the page is structured. Here's when this approach beats traditional scraping, and the complete implementation. The Core Idea Traditional scraping: price = soup . find ( ' span ' , class_ = ' product-price ' ). text # Breaks if class changes LLM extraction: price = llm_extract ( " What is the product price on this page? " , page_html ) # Works even if the structure changes completely The trade-off: LLM extraction costs money and is slower. Traditional scraping is free and fast. Use LLMs when: Structure changes frequently (news site

DEV Community

12mabout 1 hour ago

ModelsFresh

Microsoft expands further beyond OpenAI with new AI models - LinkedIn

Microsoft expands further beyond OpenAI with new AI models LinkedIn

GNews AI Microsoft

1mabout 9 hours ago