Intégration Ollama et contexte maison

Cet article fait partie de la série Assistant vocal sur Raspberry Pi.

La réponse hardcodée de l’article #2 avait un seul but : confirmer que le pipeline audio fonctionne. Maintenant, on remplace cette ligne par un vrai appel HTTP à Ollama sur le pi-cerveau. On ajoute aussi un system prompt pour donner une identité à l’assistant.

Le code complet de cet article est disponible sur GitHub.

[Bouton GPIO] → [arecord] → [Whisper] → [Ollama sur pi-cerveau] → [Piper TTS] → [aplay]

Étape 1 : Le service Ollama

Crée Services/OllamaService.cs :

using System.Net.Http.Json;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface ILlmService
{
    Task<string> GenerateAsync(string prompt, CancellationToken cancellationToken);
}

public class OllamaService : ILlmService
{
    private readonly HttpClient _http;
    private readonly AssistantOptions _options;
    private readonly ILogger<OllamaService> _logger;

    public OllamaService(HttpClient http, IOptions<AssistantOptions> options, ILogger<OllamaService> logger)
    {
        _http = http;
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> GenerateAsync(string prompt, CancellationToken cancellationToken)
    {
        _logger.LogInformation("Envoi au LLM : \"{Prompt}\"", prompt);

        var request = new
        {
            model = _options.OllamaModel,
            prompt = prompt,
            stream = false
        };

        var response = await _http.PostAsJsonAsync(
            $"{_options.OllamaBaseUrl}/api/generate",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content.ReadFromJsonAsync<OllamaResponse>(
            cancellationToken: cancellationToken);

        var text = result?.Response?.Trim() ?? "Je n'ai pas de réponse.";
        _logger.LogInformation("Réponse du LLM : \"{Text}\"", text);
        return text;
    }
}

internal record OllamaResponse(string Response);

Étape 2 : Le contexte maison

Le system prompt est ce qui donne une personnalité à l’assistant. On le stocke dans appsettings.json pour pouvoir le modifier sans recompiler.

Crée Services/ContextService.cs :

using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IContextService
{
    string BuildPrompt(string userInput);
}

public class ContextService : IContextService
{
    private readonly AssistantOptions _options;

    public ContextService(IOptions<AssistantOptions> options)
    {
        _options = options.Value;
    }

    public string BuildPrompt(string userInput)
    {
        return _options.SystemPrompt
            + "\n\nQuestion de l'utilisateur : " + userInput
            + "\n\nRéponds en français, de façon concise (2-3 phrases maximum). Pas de mise en forme markdown.";
    }
}

Étape 3 : Mettre à jour AssistantOptions

Ajoute les nouvelles propriétés dans AssistantOptions.cs :

namespace AudioAssistant;

public class AssistantOptions
{
    public int GpioButtonPin { get; set; } = 17;
    public string AudioDevice { get; set; } = "hw:1,0";
    public int RecordingDurationSeconds { get; set; } = 10;
    public string WhisperModel { get; set; } = "ggml-base.bin";
    public string PiperBinary { get; set; } = "/home/gabriel/piper/piper/piper";
    public string PiperVoice { get; set; } = "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx";
    public string AudioOutputDevice { get; set; } = "hw:2,0";

    // Ollama
    public string OllamaBaseUrl { get; set; } = "http://pi-cerveau.local:11434";
    public string OllamaModel { get; set; } = "llama3.2:3b";

    // Contexte
    public string SystemPrompt { get; set; } = "";
}

Étape 4 : Mettre à jour appsettings.json

{
  "Assistant": {
    "GpioButtonPin": 17,
    "AudioDevice": "hw:3,0",
    "RecordingDurationSeconds": 10,
    "WhisperModel": "ggml-base.bin",
    "PiperBinary": "/home/gabriel/piper/piper/piper",
    "PiperVoice": "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx",
    "AudioOutputDevice": "hw:3,0",
    "OllamaBaseUrl": "http://pi-cerveau.local:11434",
    "OllamaModel": "llama3.2:3b",
    "SystemPrompt": "Tu es un assistant vocal personnel qui s'appelle Alex. Tu aides la famille Mongeon qui habite à Blainville, Québec, Canada. Tu réponds toujours en français québécois, de façon naturelle et chaleureuse. Tu connais les membres de la famille : Gabriel (le père, développeur .NET), et sa famille. La météo locale est celle de Blainville, dans les Laurentides. Tu es concis — tes réponses font 1 à 3 phrases maximum, car elles seront lues à voix haute."
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

Personnalisez le SystemPrompt avec les noms de votre famille, vos préférences, votre routine. Plus le contexte est précis, plus l’assistant semble intelligent sans avoir besoin d’un modèle plus gros.

Étape 5 : Package NuGet et enregistrement DI

AddHttpClient nécessite un package supplémentaire :

dotnet add package Microsoft.Extensions.Http

Mettez à jour Program.cs :

using AudioAssistant;
using AudioAssistant.Services;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.Configure<AssistantOptions>(
    builder.Configuration.GetSection("Assistant"));

builder.Services.AddSingleton<IGpioService, GpioService>();
builder.Services.AddSingleton<IAudioRecorderService, AudioRecorderService>();
builder.Services.AddSingleton<ITranscriptionService, WhisperTranscriptionService>();
builder.Services.AddSingleton<ISpeechService, PiperSpeechService>();
builder.Services.AddSingleton<IContextService, ContextService>();

// HttpClient pour Ollama — AddHttpClient gère le pooling et évite les DNS stale
builder.Services.AddHttpClient<ILlmService, OllamaService>(client =>
{
    client.Timeout = TimeSpan.FromSeconds(60);
});

builder.Services.AddHostedService<Worker>();

var host = builder.Build();
host.Run();

Étape 6 : Mettre à jour Worker.cs

La seule vraie différence avec l’article #2 : on remplace la réponse hardcodée par _context.BuildPrompt() + _llm.GenerateAsync().

using AudioAssistant.Services;

namespace AudioAssistant;

public class Worker : BackgroundService
{
    private readonly IGpioService _gpio;
    private readonly IAudioRecorderService _recorder;
    private readonly ITranscriptionService _transcription;
    private readonly ILlmService _llm;
    private readonly IContextService _context;
    private readonly ISpeechService _speech;
    private readonly ILogger<Worker> _logger;

    public Worker(
        IGpioService gpio,
        IAudioRecorderService recorder,
        ITranscriptionService transcription,
        ILlmService llm,
        IContextService context,
        ISpeechService speech,
        ILogger<Worker> logger)
    {
        _gpio = gpio;
        _recorder = recorder;
        _transcription = transcription;
        _llm = llm;
        _context = context;
        _speech = speech;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Assistant démarré. Appuie sur le bouton pour parler.");

        while (!stoppingToken.IsCancellationRequested)
        {
            _gpio.WaitForButtonPress(stoppingToken);
            if (stoppingToken.IsCancellationRequested) break;

            try
            {
                var audioFile = await _recorder.RecordAsync(stoppingToken);
                var texte = await _transcription.TranscribeAsync(audioFile, stoppingToken);

                if (string.IsNullOrWhiteSpace(texte))
                {
                    await _speech.SpeakAsync("Je n'ai pas bien entendu. Peux-tu répéter?", stoppingToken);
                }
                else
                {
                    var prompt = _context.BuildPrompt(texte);
                    var reponse = await _llm.GenerateAsync(prompt, stoppingToken);
                    await _speech.SpeakAsync(reponse, stoppingToken);
                }

                if (File.Exists(audioFile))
                    File.Delete(audioFile);
            }
            catch (Exception ex) when (!stoppingToken.IsCancellationRequested)
            {
                _logger.LogError(ex, "Erreur dans le pipeline");
                await _speech.SpeakAsync("Une erreur s'est produite.", stoppingToken);
            }
        }
    }
}

Deux ajustements pratiques découverts au test

Problème 1 : le micro enregistre en 48 kHz stéréo, Whisper veut du 16 kHz mono.

La commande arecord -f cd demande 44100 Hz stéréo, mais certains adaptateurs USB forcent 48000 Hz stéréo. Whisper.net s’attend à du 16000 Hz mono. Il faut convertir avec ffmpeg après l’enregistrement.

Remplacez Services/AudioRecorderService.cs :

using System.Diagnostics;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IAudioRecorderService
{
    Task<string> RecordAsync(CancellationToken cancellationToken);
}

public class AudioRecorderService : IAudioRecorderService
{
    private readonly AssistantOptions _options;
    private readonly ILogger<AudioRecorderService> _logger;

    public AudioRecorderService(IOptions<AssistantOptions> options, ILogger<AudioRecorderService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> RecordAsync(CancellationToken cancellationToken)
    {
        var rawFile = Path.Combine(Path.GetTempPath(), $"audio_raw_{Guid.NewGuid()}.wav");
        var outputFile = Path.Combine(Path.GetTempPath(), $"audio_{Guid.NewGuid()}.wav");

        _logger.LogInformation("Enregistrement démarré ({Duration}s)...", _options.RecordingDurationSeconds);

        // Enregistrer en natif (48000 Hz stéréo — format supporté par l'adaptateur USB)
        var recordPsi = new ProcessStartInfo
        {
            FileName = "arecord",
            Arguments = $"-D {_options.AudioDevice} -f S16_LE -c 2 -t wav -d {_options.RecordingDurationSeconds} {rawFile}",
            RedirectStandardError = true,
            UseShellExecute = false
        };

        using var recordProcess = Process.Start(recordPsi)!;
        await recordProcess.WaitForExitAsync(cancellationToken);

        // Convertir en 16000 Hz mono pour Whisper
        var ffmpegPsi = new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = $"-y -i {rawFile} -ar 16000 -ac 1 {outputFile}",
            RedirectStandardError = true,
            UseShellExecute = false
        };

        using var ffmpegProcess = Process.Start(ffmpegPsi)!;
        await ffmpegProcess.WaitForExitAsync(cancellationToken);

        if (File.Exists(rawFile))
            File.Delete(rawFile);

        _logger.LogInformation("Enregistrement terminé → {File}", outputFile);
        return outputFile;
    }
}

ffmpeg est déjà installé depuis l’article #1. Le fichier brut est supprimé après conversion.

Problème 2 : aplay échoue si le périphérique USB attend du stéréo.

Mettez à jour les ProcessStartInfo dans Services/PiperSpeechService.cs pour ajouter -c 2 :

var piperPsi = new ProcessStartInfo
{
    FileName = _options.PiperBinary,
    Arguments = $"--model {_options.PiperVoice} --output_raw",
    RedirectStandardInput = true,
    RedirectStandardOutput = true,
    UseShellExecute = false
};

var aplayPsi = new ProcessStartInfo
{
    FileName = "aplay",
    Arguments = $"-D {_options.AudioOutputDevice} -r 22050 -f S16_LE -c 2 -t raw -",
    RedirectStandardInput = true,
    UseShellExecute = false
};

Si vous utilisez le jack 3.5mm au lieu d’un adaptateur USB, retirez -c 2 et gardez le mono.

Valider Ollama avant de tester le pipeline complet

Avant de lancer dotnet run, confirmez que le pi-cerveau répond :

# Tester la connectivité
curl http://pi-cerveau.local:11434/api/tags

# Tester une génération
curl -X POST http://pi-cerveau.local:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2:3b","prompt":"Dis bonjour en français québécois en une phrase.","stream":false}'

Si vous voyez un JSON avec un champ response, Ollama est prêt. Lancez le projet :

cd ~/projects/AudioAssistant
dotnet build
dotnet run

Appuyez sur le bouton et parlez. La première réponse est un peu lente : Llama 3.2 3B met quelques secondes à chauffer, mais après ça va mieux.

Aucun appel cloud, aucune donnée qui quitte le réseau. Le Pi Client fait l’audio et la transcription ; le Pi Cerveau fait le LLM. Tout tourne sur le réseau local.

Le code complet de cet article est disponible sur GitHub.

Articles de la série

Setup des deux Raspberry Pi
Worker Service .NET 10 et pipeline audio
Intégration Ollama et contexte maison (cet article)
Mémoire, détection de silence et systemd
Météo en temps réel et swap Claude API
Function Calling : enseigner des outils à l’assistant
Bilan, leçons apprises et perspectives v2

Dans l’article #4, on ajoute la mémoire conversationnelle, la détection automatique de silence, et on configure systemd pour que l’assistant démarre au boot.

Cet article a été rédigé avec l’aide de l’IA et révisé par moi.