How to integrate VS Code with Ollama for local AI assistance

The New Stackby Jack WallenApril 1, 20265 min read1 views

If you’re starting your journey as a programmer and want to jump-start that process, you might be interested in taking The post How to integrate VS Code with Ollama for local AI assistance appeared first on The New Stack .

If you’re starting your journey as a programmer and want to jump-start that process, you might be interested in taking advantage of AI to make the process of getting up to speed a bit simpler. After all, coding can be a tough business to break into, and every advantage you can give yourself should be considered.

Before I continue, I will say this: use AI to help you learn the language that you’re interested in and not as a substitute for actually learning the language. Consider this an assistant, not a replacement for skill.

When I need to turn to AI, I always go for locally-installed options for a couple of reasons. First, using locally installed AI doesn’t put a strain on the electrical grid. Second, I don’t have to worry that a third party is going to get a glimpse of my queries, so privacy is actually possible.

To that end, I depend on Ollama as my chosen locally-installed AI tool. Ollama is easy to use, flexible, and reliable.

If your IDE of choice is Visual Studio Code, you’re in luck, as you can integrate it with a locally installed instance of Ollama.

I’m going to show you how this is done.

What you’ll need

To make this work, you’ll need a desktop OS running Linux, macOS, or Windows. I’ll demonstrate the process on a Ubuntu-based Linux distribution (Pop!OS). If you’re using either macOS or Windows, the only things that you’ll need to change are the installation of Ollama and VS Code. Fortunately, in both instances, it’s just a matter of downloading the binary installer of each tool, double-clicking the downloaded files, and walking through the setup process.

On Linux, it’s a bit different.

Let me show you.

Installing Ollama

The first thing we’ll do is install Ollama. If you’re using macOS or Windows, download the .dmg for Mac or the .exe for Windows, double-click the file, and you’re off.

On Linux, open a terminal window and issue the command:

curl -fsSL https://ollama.com/install.sh | sh

You’ll be prompted for your sudo password before the installation begins.

After the installation is complete, you’ll then need to pull a specific LLM for Ollama. On macOS and Windows, open the Ollama GUI, go to the query field, click the downward-pointing arrow, type codellama, and click the entry to install the model.

On Linux, open a terminal app and pull the necessary LLM with:

ollama pull codellama

Install VS Code

Next, you’ll need to install VS Code.

The same thing holds true: with macOS or Windows, download the VS Code executable binary for your OS of choice, double-click the downloaded file, and walk through the installation wizard.

On Linux, you’ll also need to download the installer for your distribution of choice (.deb for Debian-based distributions, .rpm for Fedora-based distributions, or the Snap package).

To install VS Code on Linux, change into the directory housing the installer file you downloaded. Install the app with one of the following commands:

For Ubuntu-based distributions: sudo dpkg -i code*.deb
For Fedora-based distributions: sudo rpm -i code*.rpm
For Snap packages: sudo snap install code –classic

You now have the two primary pieces to get you started.

Setting up VS Code

The next step is to set up VS Code to work with Ollama. To that, you’ll need to install an extension called Continue.

For that, hit Ctrl+P (on macOS, that’s Cmd+P).

In the resulting field, type:

ext install continue.continue

In the resulting page (Figure 1), click Install.

Figure 1: Installing the necessary extension on VS Code is simple.

Once the extension is installed, click on the Continue icon in the left sidebar. In the resulting window, click the Select Model drop-down and click Add Chat model (Figure 2).

Figure 2: You have to add a model before you can continue.

In the resulting window, select Ollama from the provider drop-down (Figure 3).

Figure 3: You can select from any one of the available models, but we’re going with Ollama.

Next, make sure to select Local from the tabs and then click the terminal icon to the right of each command. This will open the built-in terminal, where you’ll then need to hit Enter on your keyboard to execute the command (Figure 4).

Figure 4: This is where the meat of the configuration takes place.

When the first command (the Chat model command) completes, do the same for the second command (the Autocomplete model) and the third (the Embeddings model). This will take some time, so be patient. When each step is complete, you’ll see a green check by it.

After that’s completed, click Connect.

If you click the Continue extension, you should now see a new chat window that is connected to your locally installed instance of Ollama (Figure 5).

You are all set up and ready to rock.

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamaollama

ModelsLive

I Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible Models

<p>If you use Claude Code heavily, you eventually hit the same wall:</p> <ul> <li>some tasks are cheap enough for local models</li> <li>some tasks want a stronger coding agent</li> <li>some tasks are better sent to an API model</li> </ul> <p>But many MCP servers still force one provider and one execution style.</p> <p>So I evolved <code>helix-agent</code> into <strong>helix-agents</strong>.</p> <p>It now lets Claude Code delegate work across:</p> <ul> <li><code>ollama</code></li> <li><code>codex</code></li> <li><code>openai-compatible</code></li> </ul> <p>from one MCP server.</p> <h2> What changed </h2> <p>The original project was focused on one thing: sending routine work to local Ollama models with automatic routing.</p> <p>The new version keeps that path, but adds:</p> <ul> <li>multi-pr

DEV Community

3m23 minutes ago

ModelsLive

Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means

<p>So Ollama dropped version 0.19 yesterday and I genuinely think most people are sleeping on how big this is. They rebuilt the entire Mac backend on top of Apple's MLX framework and the speed numbers are kind of absurd. Were talking 1,851 tokens per second on prefill and 134 tokens per second on decode. If those numbers dont mean anything to you, let me put it this way — thats roughly twice as fast as the previous version. On the same hardware. Same model. Just better software underneath.</p> <p>I've been running local models on my MacBook for months now and the experience has always been this weird mix of "wow this actually works" and "ok why is it taking 15 seconds to start responding." That second part just got obliterated. The time to first token improvement alone changes how it feels

DEV Community

5m23 minutes ago

ModelsFresh

Bonus: More April Fools pranks from Eiffel Tower Llama

AI Weirdness

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means

DEV Community

5m23 minutes ago

ModelsLive

I Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible Models

DEV Community

3m23 minutes ago

ModelsLive

2. Mastering Time Series Forecasting with Python and timesfm

<p>KPT-0010</p> <h1> Ditching the Crystal Ball: Mastering Time Series Forecasting with Python and <code>timesfm</code> </h1> <p>Hey there, fellow developers! 👋</p> <p>Ever found yourself staring at a screen full of historical data, desperately needing to predict what's coming next? Whether it's sales figures, server load, user engagement, or sensor readings, time series forecasting is a beast many of us wrestle with regularly. And let's be real, it often feels less like science and more like art... or dark magic, depending on the day.</p> <h3> The Forecast Challenge: A Developer's Pain Point </h3> <p>I've been there. You start with the classics: ARIMA, SARIMA, then maybe Prophet. You spend hours on feature engineering, meticulously crafting your seasonalities, handling holidays, dealing w

DEV Community

7m21 minutes ago

ModelsLive

Hasbro has been hacked, and the maker of Peppa Pig says recovery could take weeks

Somewhere in Hasbro’s network, someone was where they should not have been. The $14.4 billion toy and entertainment conglomerate, owner of Peppa Pig, Transformers, Monopoly, Dungeons & Dragons, Nerf, Play-Doh, and Power Rangers ,disclosed on Wednesday that it had identified unauthorised access to its systems, an intrusion first detected on 28 March that has since […] This story continues at The Next Web

The Next Web Neural

1mabout 1 hour ago