Models claude gemini llama model release available

I Built a Visual Spec-Driven Development Extension for VS Code That Works With Any LLM

DEV Communityby Fabián SilvaApril 3, 20265 min read1 views

I Built a Visual Spec-Driven Development Extension for VS Code That Works With Any LLM The Problem If you've tried GitHub's Spec Kit , you know the value of spec-driven development: define requirements before coding, let AI generate structured specs, plans, and tasks. It's a great workflow. But there's a gap. Spec Kit works through slash commands in chat. No visual UI, no progress tracking, no approval workflow. You type /speckit.specify , read the output, type /speckit.plan , and so on. It works, but it's not visual. Kiro (Amazon's VS Code fork) offers a visual experience — but locks you into their specific LLM and requires leaving VS Code for a custom fork. I wanted both: a visual workflow inside VS Code that works with any LLM I choose . So I built Caramelo . What Caramelo Does Caramelo

I Built a Visual Spec-Driven Development Extension for VS Code That Works With Any LLM

The Problem

If you've tried GitHub's Spec Kit, you know the value of spec-driven development: define requirements before coding, let AI generate structured specs, plans, and tasks. It's a great workflow.

But there's a gap.

Spec Kit works through slash commands in chat. No visual UI, no progress tracking, no approval workflow. You type /speckit.specify, read the output, type /speckit.plan, and so on. It works, but it's not visual.

Kiro (Amazon's VS Code fork) offers a visual experience — but locks you into their specific LLM and requires leaving VS Code for a custom fork.

I wanted both: a visual workflow inside VS Code that works with any LLM I choose.

So I built Caramelo.

What Caramelo Does

Caramelo is a VS Code extension that gives you a complete visual UI for spec-driven development:

1. Connect Any LLM — Including Your Corporate Proxy

Click a preset, enter credentials, done. No CLI tools required.

Supported out of the box:

GitHub Copilot — uses your existing subscription, no API key needed
Local: Ollama, LM Studio (no API key needed)
Cloud: Claude, OpenAI, Gemini, Groq (API key)
Custom: any OpenAI-compatible endpoint
Corporate proxies: custom auth headers for Azure API Manager, AWS API Gateway, etc.

You can have multiple providers of the same type — "Claude Personal" with your own API key and "Claude MyCompany" through your company's proxy, each with different endpoints and auth settings. Switch between them by clicking the dot indicator. Models are fetched from the API when available, or entered manually with automatic validation.

2. Visual Workflow with Approval Gates

Instead of remembering which slash command to run next, Caramelo shows your workflow visually:

Each phase must be approved before the next unlocks:

Requirements → generates spec.md
Design → generates plan.md + research.md + data-model.md
Tasks → generates tasks.md

You see the documents streaming in real time as the LLM writes them. Approve when satisfied, or edit manually first. If you regenerate an earlier phase, downstream phases are flagged as stale.

3. Constitution-Driven Generation

Before creating any specs, you define your project's constitution — the non-negotiable principles:

"All features must include error handling." "TDD mandatory." "No external dependencies without justification."

You can write them manually or click "Generate with AI" — describe your project, and the LLM suggests principles. These are automatically included as context in every generation.

4. Import Specs from Jira

For teams that plan in Jira:

Connect your Jira Cloud board (search by name for orgs with 2000+ boards)
Click "From Jira" when creating a spec
Search issues or type a key directly (e.g., PROJ-123)
Title, description, acceptance criteria, and comments become your spec's input

The spec card shows a linked Jira badge — click to jump to the issue.

5. Task Execution from the Editor

Generated tasks aren't just a document — they're actionable:

Run Task — click a button, the LLM generates the code
Run All Tasks — execute everything, respecting parallel markers [P]
Output Channel — watch the LLM reasoning in real time
Progress tracking — completion percentage in the sidebar (100% only when all tasks done)
Inline checklist — toggle tasks directly in the sidebar

6. Quality Tools

Before moving forward, verify your work:

Clarify — LLM identifies ambiguities, presents questions as QuickPick dialogs
Analyze — checks consistency across all artifacts, reports findings with severity levels
Fix Issues — one-click auto-fix from the analysis report
Checklists — generates content-specific verification items

All accessible from the Caramelo menu (cat icon in the editor toolbar) — a single grouped dropdown that keeps your toolbar clean.

Architecture: How It Works

The extension is surprisingly simple (~170KB bundle):

No LLM SDKs — native fetch with a shared SSE parser, plus vscode.lm for Copilot
No React — native VS Code APIs (WebviewView, CodeLens, QuickPick)
No external CLI — doesn't require specify CLI or any tool in PATH
Spec Kit compatible — reads/writes specs/, syncs templates from GitHub releases
State-driven UI — all inline editing uses re-render pattern, no fragile DOM manipulation

What I Learned Building This

VS Code's WebviewView API is powerful. A single webview panel replaced 3 separate TreeViews and gave us forms, progress rings, task checklists, and inline editing — all with plain HTML/CSS.
SSE streaming is simple. Two LLM provider types (OpenAI-compatible + Anthropic) plus Copilot's vscode.lm API cover 95% of use cases with ~150 lines of streaming code.
Corporate LLM access is messy. Different API managers use different auth header names and prefixes. Making these configurable per-provider was essential for enterprise adoption.
State-driven re-renders beat DOM manipulation. Early attempts to inject form elements via postMessage broke because refresh() destroyed event listeners. Storing editingState and re-rendering the full HTML with editors baked in was the reliable solution.
Spec-driven development works. Using Caramelo to build Caramelo proved the workflow. Each feature went through specify → clarify → plan → tasks → implement.

Try It

Install: Search "Caramelo" in VS Code Extensions, or visit the Marketplace
Source: github.com/fsilvaortiz/caramelo
License: MIT

Contributions welcome! Check the Contributing Guide.

Built with spec-driven development, powered by any LLM you choose.

Original source

DEV Community

https://dev.to/fabian_silva_/i-built-a-visual-spec-driven-development-extension-for-vs-code-that-works-with-any-llm-36ok

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminillama

ModelsLive

How to Actually Monitor Your LLM Costs (Without a Spreadsheet)

I used to think I had a handle on my AI spending. I had a rough mental model: Claude is cheap, GPT-4 is expensive, Gemini is somewhere in the middle. Good enough, right? Then I started actually logging what I was burning through. The gap between my mental model and reality was embarrassing. The problem with just watching your bill Every major AI provider gives you a monthly bill. That's fine for accounting. It's useless for actually understanding your costs. By the time the invoice shows up, the context is gone. You don't remember which project, which feature, which dumb experiment ate half your budget. You just see a number and try to feel bad about it. What you actually need is visibility at the call level. How many tokens did that chat completion use? How expensive was that context wind

Dev.to AI

4m9 minutes ago

ProductsLive

I Audited 30+ Small Businesses on Their AI Visibility. Here's What Most Are Getting Wrong.

I run a small marketing consultancy focused on helping businesses understand how they show up - or don't - when customers use AI tools to find services. Over the last few months, I've done AI visibility audits for 30+ small businesses across hospitality, professional services, and retail. The pattern is painfully consistent. Most businesses are invisible to AI search Go to ChatGPT right now. Ask: "What's the best [your service] in [your city]?" Try it. I'll wait. If your business showed up - congratulations, you're in the minority. Most don't. Some get mentioned with outdated information. A few get described with details that are flat-out wrong. This matters because AI-powered search is growing fast. Google's AI Overviews, ChatGPT, Perplexity, Copilot - they're all pulling from a mix of we

Dev.to AI

4m6 minutes ago

Releases

NIST Revises Security and Privacy Control Catalog to Improve Software Update and Patch Releases

The catalog revision is part of NIST’s response to a recent executive order on strengthening the nation’s cybersecurity.

nist.gov

1m7 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 134 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

How to Actually Monitor Your LLM Costs (Without a Spreadsheet)

Dev.to AI

4m9 minutes ago

ModelsLive

Beyond the Bot: Crafting the Strategic Signal

In an era of generative saturation, the internet is becoming a “sea of the same.” As AI models churn out endless streams of generic… Continue reading on Medium »

Medium AI

1m41 minutes ago

ModelsLive

Why Anthropic’s OpenClaw Ban Is a Warning Shot for LLM Builders

If you are building anything serious on top of Claude, you just got a quiet but important warning: the ground under your tooling can and… Continue reading on Medium »

Medium AI

1m39 minutes ago

ModelsFresh

$k$NNProxy: Efficient Training-Free Proxy Alignment for Black-Box Zero-Shot LLM-Generated Text Detection

arXiv:2604.02008v1 Announce Type: new Abstract: LLM-generated text (LGT) detection is essential for reliable forensic analysis and for mitigating LLM misuse. Existing LGT detectors can generally be categorized into two broad classes: learning-based approaches and zero-shot methods. Compared with learning-based detectors, zero-shot methods are particularly promising because they eliminate the need to train task-specific classifiers. However, the reliability of zero-shot methods fundamentally relies on the assumption that an off-the-shelf proxy LLM is well aligned with the often unknown source LLM, a premise that rarely holds in real-world black-box scenarios. To address this discrepancy, existing proxy alignment methods typically rely on supervised fine-tuning of the proxy or repeated inter

arXiv cs.CL

2mabout 5 hours ago