Part 5 in the “Local AI with Ollama and .NET” series: Part 1 – Ollama and .NET | Part 2 – Local RAG | Part 3 – AI Agents | Part 3.5 – MCP Server | Part 4 – Microsoft Agent Framework | Version française
Through the rest of the series, Ollama ran by hand: installed locally on the machine, the ollama pull done manually, and the .NET app pointing at Ollama’s local address hardcoded. It works, but it’s several manual steps to repeat on every machine. Aspire takes care of this configuration. I wanted to see what it changes.
The complete code is available on GitHub.
The problem with hand-managed Ollama
Take a Blazor app talking to a model, the model running in Ollama, and you in the middle bridging the two. You start Ollama, you check it’s listening on the right port, you pull the model if it’s a fresh machine, and you drop the URL into appsettings.json. When a teammate clones the repo, they redo all of it their own way.
And none of it is visible. If a response takes 12 seconds, nothing tells you whether it’s the pull, the load into memory, or an oversized prompt.
Aspire lets you declare each piece as a resource and show them in a dashboard.
Ollama as a resource in the AppHost
We start from an Aspire AppHost project (dotnet new aspire-apphost, or the full template). We add the Community Toolkit Ollama hosting integration:
dotnet add package CommunityToolkit.Aspire.Hosting.Ollama
In AppHost.cs, Ollama becomes a resource and the model becomes a sub-resource:
var builder = DistributedApplication.CreateBuilder(args);
// The Ollama server runs in a container managed by Aspire
var ollama = builder.AddOllama("ollama")
.WithDataVolume(); // persist downloaded models
// The model is a first-class resource: Aspire pulls it at startup
var chat = ollama.AddModel("chat", "llama3.2");
// The Blazor app gets the connection to the model and waits until it's ready
builder.AddProject<Projects.Web>("web")
.WithReference(chat)
.WaitFor(chat);
builder.Build().Run();
Look at AddModel("chat", "llama3.2"). The first argument is the resource name, the one we reference from the app. The second is the Ollama model tag. Aspire does the pull at startup, and WaitFor keeps the app from starting before the model is loaded. The app no longer starts before the model is ready.
Wiring the Blazor app
On the Blazor app side, we add the OllamaSharp client integration:
dotnet add package CommunityToolkit.Aspire.OllamaSharp
In Program.cs, one line registers the Ollama client and an IChatClient wired to the chat resource:
builder.AddOllamaApiClient("chat").AddChatClient();
The name "chat" matches the resource declared in the AppHost. No URL, no port, no hardcoded config: Aspire does service discovery and the app receives the right address at startup. These calls (AddOllamaApiClient(...).AddChatClient()) require CommunityToolkit.Aspire.OllamaSharp 13.4.0 or later; earlier versions used AddOllamaSharpChatClient, now deprecated. For the full Microsoft.Extensions.AI pipeline, we can stack the middlewares:
builder.AddOllamaApiClient("chat")
.AddChatClient()
.UseFunctionInvocation()
.UseOpenTelemetry(configure: t => t.EnableSensitiveData = builder.Environment.IsDevelopment())
.UseLogging();
EnableSensitiveData records prompts and responses in the traces. This flag helps with debugging, but you want it off in production, where it would end up storing personal data. So we limit it to development with builder.Environment.IsDevelopment().
In a Blazor component, we inject IChatClient and use it directly:
@inject IChatClient ChatClient
@code {
private async Task<string> Ask(string question)
{
var response = await ChatClient.GetResponseAsync(question);
return response.Text;
}
}
The component doesn’t know Ollama, it knows IChatClient. That abstraction matters for production, and I come back to it further down.
The Aspire dashboard
We launch with aspire run (or F5) and the Aspire dashboard opens. We see each resource with its state: the Ollama container starting, the model downloading, the Blazor app waiting.
At startup: Ollama is running, the model is downloading, the web app is waiting.
Once the model finishes downloading, all three resources turn to Running.
The traces tab helps the most. Because the IChatClient goes through UseOpenTelemetry(), every call to the model becomes a trace with the total time, the load time, and the tokens. When a response takes 12 seconds, the trace shows where the time goes. The Ollama container logs are in the same dashboard, without having to dig through docker logs.
Setting up this telemetry by hand took more work, and it often got put off.
The slow first run
The first aspire run on a fresh machine is slow. Aspire downloads the Ollama container image, then the model on top. A llama3.2 is a few gigabytes, so the download takes a while.
.WithDataVolume() mounts a volume at /root/.ollama in the container, where Ollama keeps its models. Without a volume, every container restart redoes the full pull. With the volume, the model stays between restarts and the second run is almost instant. Add it from the start to avoid re-downloading the model on every restart.
GPU support
On a machine with an Nvidia card, we turn on acceleration with one line:
var ollama = builder.AddOllama("ollama")
.WithDataVolume()
.WithGPUSupport(OllamaGpuVendor.Nvidia);
Aspire passes the GPU config to the container. It works well when the box has the Nvidia drivers and the nvidia-container-toolkit installed. On a laptop without a dedicated GPU, leave it off and Ollama runs on the CPU, slower but enough for development. .WithOpenWebUI() adds a web chat console alongside, useful for validating the model without going through your app.
Going to production
Aspire is excellent in the development loop, and that’s what it’s built for first: the F5, the wiring between services, the dashboard, service discovery.
To deploy, aspire publish generates the artifacts (a manifest) you can target at Azure Container Apps or Docker Compose. The AppHost itself doesn’t run in production: it’s an orchestration model for development and a publish step. In production, the orchestrator is Container Apps, Kubernetes, or Compose.
In production, hosting Ollama in a container raises several questions. The ollama/ollama image is big. The model weighs several gigabytes and you have to decide where it lives: bake it into the image, which gets huge; put it on a persistent volume, which needs storage that follows the app; or download it at startup, with a slower cold start. Inference also wants a GPU, otherwise every response drags. Hosting your own LLM takes real infrastructure decisions.
The IChatClient abstraction helps with this switch. In development, the client points at local Ollama. In production, we replace it with a hosted endpoint, like Azure AI Foundry or another managed service, by changing the registration, without touching the Blazor components that call GetResponseAsync. The calling code stays the same, only the implementation behind the interface changes. For sensitive config, we go through Aspire parameters rather than keys in appsettings.json.
In practice, I use Aspire to orchestrate development and keep everything consistent, local Ollama to iterate without API costs, and a managed endpoint behind the same IChatClient once in production. The decision to host a model only comes up at deployment time.
Good luck with the deploy, and if the first aspire run looks frozen, that’s just the model downloading in the background, give it a couple of minutes.
This post was written with AI assistance and edited by me.