I Built a Governance Layer That Works Across Claude Code, Codex, and Gemini CLI
I run four AI coding assistants. Claude Code for architecture, Codex for implementation, Gemini CLI for review. Cursor sometimes. The problem isn't that any of them are bad. The problem is that none of them remember what the others did. Every time I switched models, I was re-explaining context, re-establishing decisions, and discovering that the previous model had silently reverted something. On a real API migration last month, Codex deleted an endpoint that Claude had marked as "preserve for 6 months" two sessions earlier. There was no shared record. No handoff. Just vibes. So I built Delimit to fix it. What actually breaks when you switch models Three things, consistently: Context amnesia. Claude drafts a v2 schema with nested address objects. You close the session. Open Codex. Codex has
I run four AI coding assistants. Claude Code for architecture, Codex for implementation, Gemini CLI for review. Cursor sometimes. The problem isn't that any of them are bad. The problem is that none of them remember what the others did.
Every time I switched models, I was re-explaining context, re-establishing decisions, and discovering that the previous model had silently reverted something. On a real API migration last month, Codex deleted an endpoint that Claude had marked as "preserve for 6 months" two sessions earlier. There was no shared record. No handoff. Just vibes.
So I built Delimit to fix it.
What actually breaks when you switch models Three things, consistently:
-
Context amnesia. Claude drafts a v2 schema with nested address objects. You close the session. Open Codex. Codex has no idea the schema exists. You paste it in manually, but the rationale behind the nesting decision is gone.
-
Ledger drift. You track a task in one session. Switch models. The new model starts fresh, creates a duplicate task, or worse, skips the task entirely because it doesn't know it exists.
-
Decision reversal. Model A decides to keep /v1/users alive for backward compatibility. Model B doesn't know this and removes it. No conflict resolution, no warning. Just a broken API shipped to production.
These aren't edge cases. This is the default experience when you use more than one AI coding assistant on the same codebase.
The fix: a shared ledger and context filesystem Delimit runs as an MCP server that every model connects to. It exposes three persistence layers:
Ledger -- operational task tracking that survives across sessions and models Memory -- searchable store of decisions, rationale, and context Context FS -- artifacts (schemas, migration guides, reports) that any model can read and write When Claude creates a task and drafts a schema, those are persisted. When Codex picks up the work, it loads the session handoff, reads the schema from the context filesystem, and searches memory for the design decisions. When Gemini does the governance review, it can see the full chain: who planned it, who implemented it, what decisions were made.
Setting it up npx delimit-cli setup This writes MCP configuration into Claude Code, Codex, Gemini CLI, and Cursor. Each gets the same server, the same ledger, the same memory.
For CI, add the GitHub Action:
name: API Contract Check on: pull_request
jobs: delimit: runs-on: ubuntu-latest permissions: pull-requests: write steps:
- uses: actions/checkout@v4 with: fetch-depth: 0
- uses: delimit-ai/delimit-action@v1 with: spec: api/openapi.yaml The action auto-fetches the base branch spec, diffs it, and posts a PR comment with breaking changes, semver classification, and a migration guide. No API keys, no config.
The handoff demo There's a runnable demo that simulates a three-model workflow:
git clone https://github.com/delimit-ai/delimit-mcp-server cd delimit-mcp-server python3 demos/cross_model_handoff.py It walks through a /users API migration:
Claude session -- creates the task in the ledger, drafts the v2 schema (nested address objects replacing flat fields), stores the design rationale in memory, saves a session handoff.
Codex session -- loads the handoff, reads the schema from context FS, searches memory for migration decisions, implements the endpoint, writes a migration guide, updates the ledger to in_progress.
Gemini session -- loads the full session chain, reads all artifacts, runs governance checks, classifies the change as MAJOR (3 removed fields, 1 added object), verifies the migration guide covers every breaking change, marks the task done.
Each model calls the same Delimit APIs. The ledger, memory, and context filesystem are the shared state. No copy-pasting context between sessions. No re-explaining decisions.
What governance actually looks like Delimit's diff engine detects 27 change types (17 breaking, 10 non-breaking) deterministically. Same input, same output, every time. No LLM inference in the classification path.
You can enforce policies with YAML:
.delimit/policies.yml
rules:
- id: freeze_v1 name: Freeze V1 API change_types: [endpoint_removed, method_removed, field_removed] severity: error action: forbid conditions: path_pattern: "^/v1/." message: "V1 API is frozen. Changes must be made in V2." This is the rule that would have caught Codex deleting that endpoint. The policy runs in CI, on every PR, regardless of which model wrote the code.
Limitations Delimit doesn't solve model quality differences. If Gemini writes worse code than Claude for your use case, Delimit won't fix that. It solves the continuity and governance problem.
The MCP protocol is still young. Some models have quirks with tool parameter naming (Gemini doesn't like type as a parameter name, for instance). The handoff protocol works, but it's not zero-friction yet.
The free tier gives you governance, ledger, and memory. Multi-model deliberation (where models actually debate a decision) requires Pro or BYOK API keys.
Try it npx delimit-cli demo # governance demo, no setup npx delimit-cli setup # configure all your AI assistants Or add the GitHub Action to any repo with an OpenAPI spec.
The code is MIT licensed: github.com/delimit-ai/delimit-mcp-server
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in AI Tools

A beginner's guide to the Nano-Banana-2 model by Google on Replicate
This is a simplified guide to an AI model called Nano-Banana-2 maintained by Google . If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter . Model overview nano-banana-2 is Google's fast image generation model built for speed and quality. It combines conversational editing capabilities with multi-image fusion and character consistency, making it a versatile tool for creative projects. Compared to nano-banana-pro , this version offers a balance between performance and resource efficiency. The model also supports real-time grounding through Google Web Search and Image Search, allowing it to generate images based on current events and visual references from the internet. Model inputs and outputs The model accepts text prompts along with optional reference



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!