Products llama mistral model benchmark release launch

Docker Model Runner vs Ollama: Local AI Deployment Compared 2026

Dev.to AIby Jangwook KimApril 4, 202618 min read1 views

Docker Model Runner vs Ollama: Local AI Deployment Compared 2026 Docker entered the local AI space. If you are already running models with Ollama, you are now looking at a second option that speaks the same language — literally the same OpenAI-compatible API — but comes from the company that standardized how the world ships software. Docker Model Runner (DMR) shipped with Docker Desktop 4.40 in mid-2025 and has been evolving fast. It uses llama.cpp under the hood, stores models as OCI artifacts on Docker Hub, and integrates directly into Docker Compose workflows. Ollama, meanwhile, remains the default choice for local LLM deployment with 52+ million monthly downloads, a broader model library, and an ecosystem that every AI coding tool already supports. The question is not which tool is obj

Docker Model Runner vs Ollama: Local AI Deployment Compared 2026

Docker entered the local AI space. If you are already running models with Ollama, you are now looking at a second option that speaks the same language — literally the same OpenAI-compatible API — but comes from the company that standardized how the world ships software.

Docker Model Runner (DMR) shipped with Docker Desktop 4.40 in mid-2025 and has been evolving fast. It uses llama.cpp under the hood, stores models as OCI artifacts on Docker Hub, and integrates directly into Docker Compose workflows. Ollama, meanwhile, remains the default choice for local LLM deployment with 52+ million monthly downloads, a broader model library, and an ecosystem that every AI coding tool already supports.

The question is not which tool is objectively better — it is which tool fits your workflow. This guide compares both hands-on: installation, model management, performance, GPU acceleration, IDE integration, and server deployment. We tested with Gemma 4 E4B as the reference model on both platforms.

If you are new to running models locally, start with our Ollama + Open WebUI setup guide first. If you already have Ollama running and want to know whether Docker Model Runner is worth adding to your stack, keep reading.

What Is Docker Model Runner

Docker Model Runner is Docker's native solution for running AI models locally. It is not a container that runs a model inside it — it runs models directly on the host using llama.cpp, with no container overhead.

How It Works

DMR treats AI models as first-class Docker primitives, similar to images and containers. Models are stored as OCI (Open Container Initiative) artifacts, the same standard that Docker uses for container images. This means models can be pushed to and pulled from Docker Hub, private registries, or any OCI-compliant registry.

When you pull a model with docker model pull, it downloads the GGUF weights and stores them locally. When you run inference, llama.cpp loads the model into memory, runs the computation on your CPU or GPU, and returns results through an OpenAI-compatible API on port 12434.

Key Features

OpenAI-compatible API on localhost:12434. Any tool that speaks to the OpenAI API can point at Docker Model Runner instead.
Ollama-compatible API as well — existing Ollama integrations can switch endpoints without code changes.
Docker Compose integration. Define models as services in your docker-compose.yml. Docker pulls and starts the model automatically during docker compose up.
Multiple inference engines. llama.cpp (default, broad hardware support), vLLM (high-throughput production workloads), and Diffusers (image generation).
GPU acceleration. Metal (Apple Silicon), CUDA (NVIDIA), and Vulkan (AMD, Intel, NVIDIA).
Lazy loading. Models load into memory only when a request arrives and unload when idle, freeing resources automatically.
Metrics endpoint at /metrics for monitoring performance and resource usage.

System Requirements

Docker Desktop 4.40+ (macOS, Windows) or Docker Engine on Linux
8 GB RAM minimum, 16 GB recommended
Optional: Apple Silicon (Metal), NVIDIA GPU (CUDA), or Vulkan-compatible GPU

Ollama Recap — The Current Standard

If you have been running models locally in the past two years, you probably started with Ollama. It launched in 2023 and quickly became the default tool for local LLM management.

Ollama provides a simple CLI (ollama pull, ollama run), an OpenAI-compatible API on port 11434, and a growing library of pre-configured models. It supports GGUF, Safetensors, and custom Modelfiles for fine-tuned configurations.

We covered Ollama setup in depth in our Ollama + Open WebUI guide and used it as the foundation for our Gemma 4 local setup guide. If you are new to local AI, those articles give you a working setup in under 10 minutes.

What Makes Ollama the Default

52+ million monthly downloads as of Q1 2026 (source)
Broadest model library. Hundreds of models available through ollama.com/library, plus import support for GGUF and Safetensors formats
Custom Modelfiles. Create model configurations with specific system prompts, parameters, and adapters
Ecosystem integration. LangChain, LlamaIndex, Spring AI, Open WebUI, Continue.dev, Cursor, Aider — virtually every AI developer tool supports Ollama natively
Cross-platform. macOS, Linux, Windows. Works on Apple Silicon, NVIDIA GPUs, and CPU-only setups

Installation and Setup Comparison

Installing Docker Model Runner

If you already have Docker Desktop installed, DMR may already be available. Check:

docker model version

Enter fullscreen mode

Exit fullscreen mode

If the command is not recognized, enable it in Docker Desktop:

Open Docker Desktop
Go to Settings → AI
Enable Docker Model Runner
Optionally enable GPU-backend inference if you have a supported GPU

On Linux with Docker Engine, DMR is included when installed from Docker's official repositories. The TCP endpoint is enabled by default on port 12434.

Pull your first model:

docker model pull ai/gemma4

Enter fullscreen mode

Exit fullscreen mode

Run it:

docker model run ai/gemma4 "Explain Docker Model Runner in one sentence"

Enter fullscreen mode

Exit fullscreen mode

The API is immediately available at http://localhost:12434.

Installing Ollama

Download from ollama.com or install via command line:

# macOS / Linux curl -fsSL https://ollama.com/install.sh | sh

# macOS / Linux curl -fsSL https://ollama.com/install.sh | sh

Verify

ollama --version`

Enter fullscreen mode

Exit fullscreen mode

Pull and run a model:

ollama pull gemma4:e4b ollama run gemma4:e4b "Explain Ollama in one sentence"

ollama pull gemma4:e4b ollama run gemma4:e4b "Explain Ollama in one sentence"

Enter fullscreen mode

Exit fullscreen mode

The API is available at http://localhost:11434.

Setup Comparison Summary

Aspect Docker Model Runner Ollama

Install method Included in Docker Desktop / Engine Standalone installer or script

Prerequisite Docker Desktop or Engine None

Default port 12434 11434

Enable step Settings → AI → Enable None (runs on install)

Time to first model ~2 minutes (if Docker installed) ~2 minutes

Model format GGUF (OCI artifacts) GGUF, Safetensors, custom Modelfiles

Bottom line: Ollama is faster to set up from scratch because it has no prerequisites. Docker Model Runner is faster if Docker is already part of your workflow — it is a toggle in settings, not a new tool to install.

Model Catalog and Availability

Docker Model Runner: Docker Hub AI Models

DMR pulls models from Docker Hub under the ai/ namespace. Available models include:

ai/gemma4 — Google Gemma 4 (multiple sizes)
ai/llama3.2 — Meta Llama 3.2
ai/mistral — Mistral AI
ai/phi4 — Microsoft Phi 4
ai/qwen2.5 — Alibaba Qwen 2.5
ai/deepseek-r1-distill-llama — DeepSeek R1 distilled
ai/mistral-nemo — Mistral Nemo
ai/qwq — QwQ reasoning model

Models are stored as OCI artifacts, meaning they follow the same distribution standard as Docker container images. You can also pull models from Hugging Face.

# List downloaded models docker model ls

# List downloaded models docker model ls

Pull a specific quantization

docker model pull ai/gemma4:e4b-q4_K_M

Remove a model

docker model rm ai/gemma4`

Enter fullscreen mode

Exit fullscreen mode

Ollama: The Broader Library

Ollama's model library is significantly larger. Beyond the major model families, it includes:

Community-uploaded models and fine-tunes
Custom Modelfiles for configuring system prompts, temperature, and stop tokens
Support for importing raw GGUF files and Safetensors models
Quantization variants for most models

# List downloaded models ollama list

# List downloaded models ollama list

Pull a specific model

ollama pull gemma4:e4b

Create a custom model from a Modelfile

ollama create my-assistant -f Modelfile

Remove a model

ollama rm gemma4:e4b`

Enter fullscreen mode

Exit fullscreen mode

Catalog Comparison

Aspect Docker Model Runner Ollama

Model source Docker Hub (ai/ namespace), Hugging Face ollama.com/library, Hugging Face, GGUF import

Number of models Curated selection (~20+ families) Hundreds of models + community uploads

Custom models Import GGUF files Modelfiles, GGUF import, Safetensors

Storage format OCI artifacts Proprietary blob format

Registry support Any OCI registry Ollama registry only

Bottom line: Ollama wins on catalog breadth. Docker Model Runner wins on standardized distribution — OCI artifacts mean you can use existing container registry infrastructure for model management.

Performance: Startup, Inference, and Memory

Performance between Docker Model Runner and Ollama is largely comparable — both use llama.cpp as the default inference engine. The architectural differences are in how they manage model loading and memory, not in raw inference speed.

Inference Speed

Independent benchmarks show inference speed differences of 1.0–1.12x between the two tools, which is imperceptible in practice (source). Both tools use the same underlying llama.cpp engine for GGUF model inference, so token generation speed is essentially identical for the same model and quantization level.

Model Loading

Docker Model Runner uses lazy loading — models are loaded into memory only when the first request arrives and unloaded when idle. This is resource-efficient but means the first request after idle has higher latency.

Ollama keeps models loaded in memory by default (configurable with OLLAMA_KEEP_ALIVE). This gives faster first-response times but uses more memory when idle.

Memory Usage

Both tools have similar peak memory usage for the same model since they use the same inference engine. The difference is in idle behavior:

DMR: Unloads models when idle → lower idle memory usage
Ollama: Keeps models loaded (default 5 minutes) → faster responses, higher idle memory

Inference Engine Options

Docker Model Runner offers a key advantage here: multiple inference engines.

Engine Best For Supported Platforms

llama.cpp (default) General use, broad hardware CPU, Metal, CUDA, Vulkan

vLLM High-throughput production Metal (macOS), CUDA (Linux/Windows)

Diffusers Image generation CPU, CUDA

Ollama uses its own optimized fork of llama.cpp exclusively.

On Apple Silicon, llama.cpp throughput stays stable at approximately 333–345 tokens/second regardless of output length for models like Llama 3.2 1B. vLLM shows more variance (134–343 tokens/second) but excels at concurrent request handling (source).

Bottom line: For single-user local inference, performance is a tie. Docker Model Runner's vLLM engine gives it an edge for multi-user or production scenarios where throughput matters more than time-to-first-token.

GPU Acceleration

Both tools support GPU acceleration, but the implementation and configuration differ.

Docker Model Runner GPU Support

DMR supports three GPU backends:

Metal (Apple Silicon M1/M2/M3/M4) — enabled by default, no configuration needed
CUDA (NVIDIA GPUs) — requires NVIDIA Container Runtime
Vulkan (AMD, Intel, NVIDIA) — added in Docker Desktop 4.42, broadest hardware support

Enable GPU in Docker Desktop: Settings → AI → GPU-backend inference.

On Linux with NVIDIA GPUs:

# Verify GPU is detected docker model status

# Verify GPU is detected docker model status

Pull and run with GPU acceleration

docker model pull ai/gemma4:e4b docker model run ai/gemma4:e4b "Test GPU inference"`

Enter fullscreen mode

Exit fullscreen mode

Ollama GPU Support

Ollama detects and uses available GPUs automatically:

Metal (Apple Silicon) — automatic, no configuration
CUDA (NVIDIA) — automatic if NVIDIA drivers are installed
ROCm (AMD GPUs on Linux) — supported with ROCm drivers

# Check GPU detection ollama ps

# Check GPU detection ollama ps

Force CPU-only mode if needed

OLLAMA_NO_GPU=1 ollama serve`

Enter fullscreen mode

Exit fullscreen mode

GPU Comparison

Feature Docker Model Runner Ollama

Apple Silicon Metal Yes (automatic) Yes (automatic)

NVIDIA CUDA Yes (needs runtime) Yes (automatic)

AMD ROCm Via Vulkan Yes (Linux)

Intel GPUs Via Vulkan No

Vulkan support Yes No

Configuration Settings toggle Automatic detection

Bottom line: Ollama has simpler GPU setup — it just works. Docker Model Runner has broader GPU support through Vulkan, covering AMD and Intel GPUs that Ollama cannot use.

Integration Ecosystem

This is where the comparison gets practical. If your AI coding tools cannot connect to the model runner, the performance benchmarks do not matter.

IDE and Tool Compatibility

Both Docker Model Runner and Ollama provide OpenAI-compatible APIs, which means most tools can connect to either. Here is the integration status for popular AI developer tools:

Tool Docker Model Runner Ollama

Continue.dev Yes (OpenAI provider) Yes (native Ollama provider)

Cursor Yes (OpenAI endpoint) Yes (native)

Aider Yes (via env vars) Yes (native)

Open WebUI Yes (OpenAI connection) Yes (native, recommended)

LangChain Yes (OpenAI SDK) Yes (native Ollama SDK)

LlamaIndex Yes (OpenAI SDK) Yes (native Ollama SDK)

Spring AI Yes (native support) Yes (native support)

Docker Model Runner: Continue.dev Configuration

Edit ~/.continue/config.json:

{  "models": [  {  "title": "Gemma 4 (Docker)",  "provider": "openai",  "model": "ai/gemma4",  "apiBase": "http://localhost:12434/engines/v1"  }  ] }

{  "models": [  {  "title": "Gemma 4 (Docker)",  "provider": "openai",  "model": "ai/gemma4",  "apiBase": "http://localhost:12434/engines/v1"  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Docker Model Runner: Aider Configuration

export OPENAI_API_BASE=http://localhost:12434/engines/v1 export OPENAI_API_KEY=anything aider --model ai/gemma4

export OPENAI_API_BASE=http://localhost:12434/engines/v1 export OPENAI_API_KEY=anything aider --model ai/gemma4

Enter fullscreen mode

Exit fullscreen mode

Ollama: Continue.dev Configuration

{  "models": [  {  "title": "Gemma 4 (Ollama)",  "provider": "ollama",  "model": "gemma4:e4b"  }  ] }

{  "models": [  {  "title": "Gemma 4 (Ollama)",  "provider": "ollama",  "model": "gemma4:e4b"  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Docker Compose Integration (DMR Exclusive)

This is Docker Model Runner's biggest ecosystem differentiator. You can define AI models as services in docker-compose.yml:

services:  model:  provider:  type: model  options:  model: ai/gemma4

services:  model:  provider:  type: model  options:  model: ai/gemma4

app: build: . environment:

MODEL_URL=${MODEL_MODEL_URL}
MODEL_NAME=${MODEL_MODEL_NAME} depends_on:
model`

Enter fullscreen mode

Exit fullscreen mode

When you run docker compose up, Docker automatically pulls the model, starts inference, and injects connection details (MODEL_MODEL_URL, MODEL_MODEL_NAME) into your application container. No manual setup, no glue code.

For containers that need to reach Docker Model Runner directly, add:

services:  app:  extra_hosts:

services:  app:  extra_hosts:

"model-runner.docker.internal:host-gateway"`

Enter fullscreen mode

Exit fullscreen mode

Then access the API at http://model-runner.docker.internal:12434/.

Bottom line: Ollama has deeper native integrations — most tools have a dedicated Ollama provider. Docker Model Runner works through the OpenAI-compatible API, which is universal but requires manual endpoint configuration. The Docker Compose integration is the standout feature for teams building AI-powered applications.

GPU Server Deployment on Hetzner

For models that exceed your laptop's capabilities — like Gemma 4 26B MoE or 31B Dense — you need a GPU server. We covered Hetzner GPU setup in detail in our Hetzner Cloud GPU guide.

Ollama on Hetzner

This is the battle-tested path. SSH into your Hetzner GPU server and run:

curl -fsSL https://ollama.com/install.sh | sh

Verify GPU detection

nvidia-smi ollama ps

Pull a large model

ollama pull gemma4:31b

Expose API (bind to all interfaces)

OLLAMA_HOST=0.0.0.0 ollama serve`

Enter fullscreen mode

Exit fullscreen mode

The API is available on port 11434. See our Ollama + Open WebUI guide for adding a browser interface and our Gemma 4 guide for running Gemma 4 specifically on Hetzner GPUs.

Docker Model Runner on Hetzner

Install Docker Engine on your Hetzner server:

# Install Docker Engine curl -fsSL https://get.docker.com | sh

# Install Docker Engine curl -fsSL https://get.docker.com | sh

Add NVIDIA Container Toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Verify DMR is available

docker model version docker model pull ai/gemma4`

Enter fullscreen mode

Exit fullscreen mode

The TCP endpoint on Linux is enabled by default on port 12434.

Server Deployment Comparison

Aspect Docker Model Runner Ollama

Install complexity (GPU) Higher (Docker + NVIDIA toolkit) Lower (single script)

Remote API exposure Port 12434 (TCP by default on Linux) Port 11434 (configurable)

Reverse proxy setup Standard Docker networking Standard Nginx/Caddy

Docker Compose apps Native integration Needs network configuration

Community deployment guides Growing Extensive

Bottom line: Ollama is simpler to deploy on a GPU server. Docker Model Runner is better if your server already runs Docker-based services and you want models integrated into your Compose stack. For detailed Hetzner GPU setup, including cost breakdown and Open WebUI deployment, see our self-hosting guide.

When to Use Which: Decision Framework

After testing both tools, here is the practical decision framework.

Choose Docker Model Runner When

You already use Docker in development. DMR is a natural extension of your existing workflow. No new tools to install or manage.
You are building Docker Compose applications. The native model-as-a-service integration in Compose is unmatched. Define a model, run docker compose up, and your app gets inference automatically.
You want OCI-standard model distribution. If you use private registries (Harbor, ECR, GCR) for container images, you can use the same infrastructure for AI models.
You need Vulkan GPU support. AMD or Intel GPU users have no Ollama option — DMR's Vulkan backend is the answer.
You want production inference with vLLM. DMR's vLLM engine handles concurrent requests better than llama.cpp for multi-user scenarios.

Choose Ollama When

You want the simplest path to running local models. One install script, one command to pull, one command to run. No prerequisites.
You need the broadest model library. Ollama's registry has more models, more quantization options, and community uploads.
You use custom Modelfiles. Ollama's Modelfile system for creating customized model configurations has no DMR equivalent.
Your tools have native Ollama support. Continue.dev, Cursor, Aider, Open WebUI — all have dedicated Ollama providers that are more polished than their OpenAI-compatible fallbacks.
You deploy to GPU servers. Ollama's single-script install with automatic GPU detection is harder to beat for server deployment.

Choose Both When

Docker Model Runner and Ollama can coexist. They use different ports (12434 vs 11434), different model storage, and different processes. Running both is a valid strategy:

Use Ollama for interactive development, Open WebUI chat, and quick model experimentation
Use Docker Model Runner for application development where models are part of your Docker Compose stack

This is especially useful if you are transitioning from Ollama to Docker Model Runner — run both while you migrate your workflows.

Running Both Side by Side

Here is a practical setup that runs both tools concurrently with the same model:

Step 1: Install Both

# Ollama (if not already installed) curl -fsSL https://ollama.com/install.sh | sh

# Ollama (if not already installed) curl -fsSL https://ollama.com/install.sh | sh

Docker Model Runner (enable in Docker Desktop Settings → AI)

docker model version`

Enter fullscreen mode

Exit fullscreen mode

Step 2: Pull the Same Model on Both

ollama pull gemma4:e4b docker model pull ai/gemma4:e4b

ollama pull gemma4:e4b docker model pull ai/gemma4:e4b

Enter fullscreen mode

Exit fullscreen mode

Step 3: Verify Both APIs

# Test Ollama API curl http://localhost:11434/v1/chat/completions \  -H "Content-Type: application/json" \  -d '{  "model": "gemma4:e4b",  "messages": [{"role": "user", "content": "Hello from Ollama"}]  }'

# Test Ollama API curl http://localhost:11434/v1/chat/completions \  -H "Content-Type: application/json" \  -d '{  "model": "gemma4:e4b",  "messages": [{"role": "user", "content": "Hello from Ollama"}]  }'

Test Docker Model Runner API

curl http://localhost:12434/engines/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "ai/gemma4:e4b", "messages": [{"role": "user", "content": "Hello from Docker"}] }'`

Enter fullscreen mode

Exit fullscreen mode

Step 4: Configure Tools for Either

Point your AI coding tools at whichever backend you prefer:

Continue.dev: Use ollama provider for port 11434, openai provider with custom apiBase for port 12434
Aider: Set OPENAI_API_BASE to either endpoint
Open WebUI: Add both as connections — Ollama native + OpenAI-compatible for DMR

Resource Considerations

Running both tools simultaneously doubles your disk usage for shared models (they store models separately). Memory usage depends on which models are actively loaded — DMR's lazy unloading helps here.

If disk space is a concern, pick one as your primary and use the other only for specific workflows.

Quick Reference: Command Comparison

Task Docker Model Runner Ollama

Check version docker model version ollama --version

Pull a model docker model pull ai/gemma4 ollama pull gemma4:e4b

Run interactively docker model run ai/gemma4 "prompt" ollama run gemma4:e4b

List models docker model ls ollama list

Remove a model docker model rm ai/gemma4 ollama rm gemma4:e4b

Check status docker model status ollama ps

API endpoint localhost:12434 localhost:11434

API format OpenAI + Ollama compatible OpenAI + Ollama native

Model source

ai/ on Docker Hub ollama.com/library

Final Verdict

Docker Model Runner is not an Ollama replacement — it is an Ollama alternative for Docker-native workflows. The tools solve the same problem (running LLMs locally) with different integration philosophies.

If Docker is already central to your development workflow, Docker Model Runner is the better choice. The Compose integration alone justifies the switch for teams building AI-powered applications. OCI-standard model distribution and multi-engine support (llama.cpp + vLLM + Diffusers) add long-term flexibility.

If you want the simplest, most broadly supported local AI tool, Ollama remains the default recommendation. The ecosystem is larger, the model library is deeper, and every AI developer tool treats Ollama as a first-class citizen.

If you are serious about local AI, run both. They coexist without conflict, serve different parts of your workflow, and together give you the widest possible compatibility with the AI tooling ecosystem.

Related Guides

Building a local AI stack involves more than choosing a model runner. Here are the guides that complete the picture:

Ollama + Open WebUI Self-Hosting Guide — Set up Ollama with a browser-based chat interface from scratch
Gemma 4 Local Setup Guide — Run all four Gemma 4 model sizes locally, including Docker Model Runner compatibility
Hetzner Cloud GPU Server Guide — Deploy larger models on affordable GPU servers
Self-Host Your Dev Stack Under $20/Month — The broader self-hosting strategy including AI infrastructure
Free AI Coding Tools 2026 — Both Docker Model Runner and Ollama serve as free backends for AI coding tools

Original source

Dev.to AI

https://dev.to/jangwook_kim_e31e7291ad98/docker-model-runner-vs-ollama-local-ai-deployment-compared-2026-297c

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamistralmodel

Analyst NewsLive

Two-Pass LLM Processing: When Single-Pass Classification Isn't Enough

Here's a pattern I keep running into: you have a batch of items (messages, tickets, documents, transactions) and you need to classify each one. The obvious approach is one LLM call per item. It works fine until it doesn't. The failure mode is subtle. Each item gets classified correctly in isolation. But the relationships between items -- escalation patterns, contradictions, duplicate reports of the same issue -- are invisible to a single-pass classifier because it never sees the full picture. The problem Say you're triaging a CEO's morning messages. Three Slack messages from the same person: 9:15 AM : "API migration 60% done, no blockers" 10:30 AM : "Found an issue with payment endpoints, investigating" 11:45 AM : "3% of live payments failing, need rollback/hotfix decision within an hour"

Dev.to AI

7mabout 1 hour ago

ProductsLive

Claude Code slash commands: the complete reference with custom examples

Claude Code slash commands: the complete reference with custom examples If you've been using Claude Code for more than a week, you've probably typed /help and seen a list of slash commands. But most developers only use /clear and /exit . Here's everything else — and how to build your own. Built-in slash commands Command What it does /help Show all commands /clear Clear conversation context /compact Summarize and compress context /memory Show what Claude remembers /review Request code review /init Initialize CLAUDE.md in current dir /exit or /quit Exit Claude Code /model Switch between Claude models /cost Show token usage and cost /doctor Check your setup The ones you're probably not using /compact vs /clear Most people use /clear when the context gets long. But /compact is usually better:

Dev.to AI

4mabout 1 hour ago

ModelsLive

Interpreting Gradient Routing’s Scalable Oversight Experiment

%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO) . The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG) , discuss how it might be useful in combination, say that it’s closer to semi-supervised reinforcement learning (SSRL) , and point to some other possible baselines. We think this post would be useful for interpreting Gradient Routing’s SO experiment and for readers who are trying to build intuition about what modern Scalable Oversight work does and does not assume. This post is mainly about two things. First , it’s about the importance of simple baselines. Second , it's about different ways of mode

lesswrong.com

14mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 247 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsLive

Claude Code slash commands: the complete reference with custom examples

Dev.to AI

4mabout 1 hour ago

ProductsLive

Coding Agents Have Hands But No Eyes

Sebastian Raschka just published a clean taxonomy of coding agent components . Six categories: live repo context, prompt caching, structured tools, context reduction, memory, and resumption. It's solid engineering work. But read it carefully and you'll notice something: every component serves task completion . Not a single one serves perception . The Hidden Assumption Most agent frameworks start here: given a goal, decompose it into steps, execute. This is goal-driven architecture. You tell the agent to fix a bug, write a test, refactor a function. It doesn't need to perceive its environment — you are its eyes. This works great for coding agents. The problem is when people assume this is what all agents look like. What If the Agent Looks Before It Leaps? Imagine a different starting point:

Dev.to AI

3mabout 1 hour ago

ProductsLive

My YouTube Automation Uploaded 29 Videos in One Afternoon — Here is What Broke

My YouTube Automation Uploaded 29 Videos in One Afternoon. Here's What Broke. I run 57 projects autonomously on two servers in my basement. One of them is a YouTube Shorts pipeline that generates, reviews, and uploads videos every day without me touching it. Yesterday it uploaded 29 videos in a single afternoon. That was not the plan. Here's the postmortem — what broke, why, and the 5-minute fix that stopped it. The Architecture The pipeline works like this: Cron job fires — triggers a pipeline (market scorecard, daily tip, promo, etc.) AI generates a script — based on market data, tips, or trending topics FFmpeg renders the video — text overlays, stock footage, voiceover Review panel scores it — if it scores above 6/10, it proceeds Uploader publishes — uploads to YouTube, posts to Twitter

Dev.to AI

4m44 minutes ago

ProductsLive

How to Build a Manus AI Agent That Writes & Emails Lead Reports While You Sleep

What if your most productive employee worked through the night, never asked for a raise, and sent flawless lead reports to your inbox… Continue reading on Medium »

Medium AI

1m31 minutes ago