Mémoire, détection de silence et systemd

Cet article fait partie de la série Assistant vocal sur Raspberry Pi.

L’assistant de l’article #3 fonctionne, mais chaque échange repart de zéro. On règle ça en trois temps : mémoire conversationnelle, détection automatique de silence, et démarrage au boot avec systemd.

Le code complet de cet article est disponible sur GitHub.

Partie 1 : Mémoire conversationnelle

Comment ça fonctionne

Ollama supporte le format de chat avec historique de messages, exactement comme l’API OpenAI ou Claude. Au lieu d’un simple prompt, on envoie une liste de messages [{role, content}]. Ollama conserve le contexte et génère des réponses cohérentes avec l’historique.

Tour 1 : [system] + [user: "Comment tu t'appelles?"]
Tour 2 : [system] + [user: "Comment tu t'appelles?"] + [assistant: "Je m'appelle Alex."] + [user: "Quel âge as-tu?"]
Tour 3 : ...

Étape 1.1 : Mettre à jour OllamaService

On passe de /api/generate à /api/chat, l’endpoint Ollama qui supporte l’historique de messages. Remplace Services/OllamaService.cs :

using System.Net.Http.Json;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface ILlmService
{
    Task<string> ChatAsync(List<ConversationMessage> history, CancellationToken cancellationToken);
}

public record ConversationMessage(string Role, string Content);

public class OllamaService : ILlmService
{
    private readonly HttpClient _http;
    private readonly AssistantOptions _options;
    private readonly ILogger<OllamaService> _logger;

    public OllamaService(HttpClient http, IOptions<AssistantOptions> options, ILogger<OllamaService> logger)
    {
        _http = http;
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> ChatAsync(List<ConversationMessage> history, CancellationToken cancellationToken)
    {
        _logger.LogInformation("Envoi au LLM ({Count} messages)...", history.Count);

        var request = new
        {
            model = _options.OllamaModel,
            messages = history.Select(m => new { role = m.Role, content = m.Content }),
            stream = false
        };

        var response = await _http.PostAsJsonAsync(
            $"{_options.OllamaBaseUrl}/api/chat",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content.ReadFromJsonAsync<OllamaChatResponse>(
            cancellationToken: cancellationToken);

        var text = result?.Message?.Content?.Trim() ?? "Je n'ai pas de réponse.";
        _logger.LogInformation("Réponse du LLM : \"{Text}\"", text);
        return text;
    }
}

internal record OllamaChatMessage(string Role, string Content);
internal record OllamaChatResponse(OllamaChatMessage Message);

Étape 1.2 : Mettre à jour ContextService

Le ContextService devient un gestionnaire de conversation qui maintient l’historique. Remplace Services/ContextService.cs :

using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IContextService
{
    List<ConversationMessage> AddUserMessage(string userInput);
    void AddAssistantMessage(string response);
    void Reset();
}

public class ContextService : IContextService
{
    private readonly AssistantOptions _options;
    private readonly List<ConversationMessage> _history = new();
    private readonly ILogger<ContextService> _logger;

    public ContextService(IOptions<AssistantOptions> options, ILogger<ContextService> logger)
    {
        _options = options.Value;
        _logger = logger;
        _history.Add(new ConversationMessage("system", _options.SystemPrompt));
    }

    public List<ConversationMessage> AddUserMessage(string userInput)
    {
        _history.Add(new ConversationMessage("user", userInput));
        _logger.LogInformation("Historique : {Count} messages", _history.Count);
        return _history;
    }

    public void AddAssistantMessage(string response)
    {
        _history.Add(new ConversationMessage("assistant", response));
    }

    public void Reset()
    {
        _history.Clear();
        _history.Add(new ConversationMessage("system", _options.SystemPrompt));
        _logger.LogInformation("Historique réinitialisé.");
    }
}

L’historique grossit à chaque échange. Si vous constatez des incohérences après plusieurs tours, appelez Reset() manuellement, ou ajoutez une vérification sur MaxConversationTurns dans AddAssistantMessage.

Étape 1.3 : Ajouter MaxConversationTurns dans AssistantOptions

namespace AudioAssistant;

public class AssistantOptions
{
    public int GpioButtonPin { get; set; } = 17;
    public string AudioDevice { get; set; } = "hw:3,0";
    public int RecordingDurationSeconds { get; set; } = 10;
    public string WhisperModel { get; set; } = "ggml-base.bin";
    public string PiperBinary { get; set; } = "/home/gabriel/piper/piper/piper";
    public string PiperVoice { get; set; } = "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx";
    public string AudioOutputDevice { get; set; } = "hw:3,0";
    public string OllamaBaseUrl { get; set; } = "http://pi-cerveau.local:11434";
    public string OllamaModel { get; set; } = "llama3.2:3b";
    public string SystemPrompt { get; set; } = "";
    public int MaxConversationTurns { get; set; } = 10;
}

Étape 1.4 : Mettre à jour Worker.cs

using AudioAssistant.Services;

namespace AudioAssistant;

public class Worker : BackgroundService
{
    private readonly IGpioService _gpio;
    private readonly IAudioRecorderService _recorder;
    private readonly ITranscriptionService _transcription;
    private readonly ILlmService _llm;
    private readonly IContextService _context;
    private readonly ISpeechService _speech;
    private readonly ILogger<Worker> _logger;

    public Worker(
        IGpioService gpio,
        IAudioRecorderService recorder,
        ITranscriptionService transcription,
        ILlmService llm,
        IContextService context,
        ISpeechService speech,
        ILogger<Worker> logger)
    {
        _gpio = gpio;
        _recorder = recorder;
        _transcription = transcription;
        _llm = llm;
        _context = context;
        _speech = speech;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Assistant démarré. Appuie sur le bouton pour parler.");

        while (!stoppingToken.IsCancellationRequested)
        {
            _gpio.WaitForButtonPress(stoppingToken);
            if (stoppingToken.IsCancellationRequested) break;

            try
            {
                var audioFile = await _recorder.RecordAsync(stoppingToken);
                var texte = await _transcription.TranscribeAsync(audioFile, stoppingToken);

                if (string.IsNullOrWhiteSpace(texte))
                {
                    await _speech.SpeakAsync("Je n'ai pas bien entendu. Peux-tu répéter?", stoppingToken);
                }
                else
                {
                    var history = _context.AddUserMessage(texte);
                    var reponse = await _llm.ChatAsync(history, stoppingToken);
                    _context.AddAssistantMessage(reponse);
                    await _speech.SpeakAsync(reponse, stoppingToken);
                }

                if (File.Exists(audioFile))
                    File.Delete(audioFile);
            }
            catch (Exception ex) when (!stoppingToken.IsCancellationRequested)
            {
                _logger.LogError(ex, "Erreur dans le pipeline");
                await _speech.SpeakAsync("Une erreur s'est produite.", stoppingToken);
            }
        }
    }
}

Étape 1.5 : Mettre à jour PiperSpeechService

Le pipe direct Piper vers aplay causait un rate mismatch avec l’adaptateur USB AB13X. On passe par des fichiers intermédiaires : Piper génère un WAV 22050 Hz, ffmpeg resample à 48000 Hz stéréo, aplay joue. Le finally s’assure que les fichiers temporaires sont toujours nettoyés.

Remplace Services/PiperSpeechService.cs :

using System.Diagnostics;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface ISpeechService
{
    Task SpeakAsync(string text, CancellationToken cancellationToken);
}

public class PiperSpeechService : ISpeechService
{
    private readonly AssistantOptions _options;
    private readonly ILogger<PiperSpeechService> _logger;

    public PiperSpeechService(IOptions<AssistantOptions> options, ILogger<PiperSpeechService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }

    public async Task SpeakAsync(string text, CancellationToken cancellationToken)
    {
        _logger.LogInformation("Synthèse vocale : \"{Text}\"", text);

        var piperFile = Path.Combine(Path.GetTempPath(), $"tts_{Guid.NewGuid()}.wav");
        var resampledFile = Path.Combine(Path.GetTempPath(), $"tts_resampled_{Guid.NewGuid()}.wav");

        try
        {
            // 1. Piper génère un WAV 22050 Hz mono
            var piperPsi = new ProcessStartInfo
            {
                FileName = _options.PiperBinary,
                Arguments = $"--model {_options.PiperVoice} --output_file {piperFile}",
                RedirectStandardInput = true,
                UseShellExecute = false
            };

            using var piper = Process.Start(piperPsi)!;
            await piper.StandardInput.WriteLineAsync(text);
            piper.StandardInput.Close();
            await piper.WaitForExitAsync(cancellationToken);

            // 2. ffmpeg resample vers 48000 Hz stéréo pour l'adaptateur USB
            var ffmpegPsi = new ProcessStartInfo
            {
                FileName = "ffmpeg",
                Arguments = $"-y -i {piperFile} -ar 48000 -ac 2 {resampledFile}",
                RedirectStandardError = true,
                UseShellExecute = false
            };

            using var ffmpeg = Process.Start(ffmpegPsi)!;
            await ffmpeg.WaitForExitAsync(cancellationToken);

            // 3. aplay joue le fichier resamplé
            var aplayPsi = new ProcessStartInfo
            {
                FileName = "aplay",
                Arguments = $"-D {_options.AudioOutputDevice} {resampledFile}",
                UseShellExecute = false
            };

            using var aplay = Process.Start(aplayPsi)!;
            await aplay.WaitForExitAsync(cancellationToken);
        }
        finally
        {
            if (File.Exists(piperFile)) File.Delete(piperFile);
            if (File.Exists(resampledFile)) File.Delete(resampledFile);
        }
    }
}

Tester la mémoire

Lance dotnet run et essaie cette séquence :

Toi  : "Comment tu t'appelles?"
Alex : "Je m'appelle Alex."

Toi  : "Répète ton nom."
Alex : "Mon nom est Alex." ← il se souvient!

Partie 2 : Détection automatique de silence

Le timer fixe de 10 secondes force à attendre avant d’avoir une réponse. La détection de silence coupe l’enregistrement dès que vous arrêtez de parler.

Étape 2.1 : Nouveaux paramètres dans AssistantOptions

public int SilenceDurationMs { get; set; } = 1500;   // 1.5s de silence pour couper
public string SilenceThreshold { get; set; } = "-40dB"; // Seuil de détection

Étape 2.2 : Mettre à jour AudioRecorderService

On remplace le timer fixe par silencedetect de ffmpeg. Remplace Services/AudioRecorderService.cs :

using System.Diagnostics;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IAudioRecorderService
{
    Task<string> RecordAsync(CancellationToken cancellationToken);
}

public class AudioRecorderService : IAudioRecorderService
{
    private readonly AssistantOptions _options;
    private readonly ILogger<AudioRecorderService> _logger;

    public AudioRecorderService(IOptions<AssistantOptions> options, ILogger<AudioRecorderService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> RecordAsync(CancellationToken cancellationToken)
    {
        var rawFile = Path.Combine(Path.GetTempPath(), $"audio_raw_{Guid.NewGuid()}.wav");
        var outputFile = Path.Combine(Path.GetTempPath(), $"audio_{Guid.NewGuid()}.wav");

        _logger.LogInformation("Enregistrement démarré (silence auto)...");

        // ffmpeg enregistre depuis ALSA et coupe après SilenceDurationMs de silence
        // RecordingDurationSeconds est la durée maximale de sécurité
        var ffmpegPsi = new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = string.Join(" ",
                "-f alsa",
                $"-i {_options.AudioDevice}",
                "-af",
                $"silencedetect=noise={_options.SilenceThreshold}:d={_options.SilenceDurationMs / 1000.0}",
                $"-t {_options.RecordingDurationSeconds}",
                "-ar 16000 -ac 1",
                $"-y {rawFile}"),
            RedirectStandardError = true,
            RedirectStandardInput = true,
            UseShellExecute = false
        };

        using var ffmpegProcess = Process.Start(ffmpegPsi)!;

        // Lire stderr pour détecter la fin du silence
        _ = Task.Run(async () =>
        {
            string? line;
            while ((line = await ffmpegProcess.StandardError.ReadLineAsync()) != null)
            {
                if (line.Contains("silence_end"))
                {
                    _logger.LogInformation("Silence détecté — arrêt de l'enregistrement.");
                    // Écrire 'q' plutôt que Kill() : ffmpeg finit d'écrire le fichier avant de quitter
                    ffmpegProcess.StandardInput.Write("q");
                    break;
                }
            }
        }, cancellationToken);

        await ffmpegProcess.WaitForExitAsync(cancellationToken);

        var convertPsi = new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = $"-y -i {rawFile} -ar 16000 -ac 1 {outputFile}",
            RedirectStandardError = true,
            UseShellExecute = false
        };

        using var convertProcess = Process.Start(convertPsi)!;
        await convertProcess.WaitForExitAsync(cancellationToken);

        if (File.Exists(rawFile))
            File.Delete(rawFile);

        _logger.LogInformation("Enregistrement terminé → {File}", outputFile);
        return outputFile;
    }
}

Mettre à jour appsettings.json

{
  "Assistant": {
    "GpioButtonPin": 17,
    "AudioDevice": "hw:3,0",
    "RecordingDurationSeconds": 15,
    "SilenceDurationMs": 1500,
    "SilenceThreshold": "-40dB",
    "WhisperModel": "ggml-base.bin",
    "PiperBinary": "/home/gabriel/piper/piper/piper",
    "PiperVoice": "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx",
    "AudioOutputDevice": "hw:3,0",
    "OllamaBaseUrl": "http://pi-cerveau.local:11434",
    "OllamaModel": "llama3.2:1b",
    "MaxConversationTurns": 10,
    "SystemPrompt": "Tu es un assistant vocal personnel qui s'appelle Alex. Tu aides la famille Mongeon qui habite à Blainville, Québec, Canada. Tu réponds toujours en français québécois, de façon naturelle et chaleureuse. Tu es concis : tes réponses font 1 à 3 phrases maximum, car elles seront lues à voix haute."
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

RecordingDurationSeconds est maintenant une durée maximale de sécurité. Si le silence n’est pas détecté, l’enregistrement s’arrête quand même après ce délai; mettez-le à 15-20 secondes.

Sur un Pi 4 4 Go avec environ 800 Mo disponible, llama3.2:3b peut dépasser le timeout de 60 secondes. llama3.2:1b donne une latence de 10-15 secondes, ce qui est bien suffisant pour des réponses courtes.

Dans Program.cs, ajustez aussi le timeout du client HTTP :

builder.Services.AddHttpClient<ILlmService, OllamaService>(client =>
{
    client.Timeout = TimeSpan.FromSeconds(120);
});

Partie 3 : Démarrage automatique au boot

Étape 3.1 : Publier le binaire

cd ~/projects/AudioAssistant
dotnet publish -c Release -r linux-arm64 --self-contained false -o ~/assistant-publish

Étape 3.2 : Créer le service systemd

sudo nano /etc/systemd/system/assistant.service

[Unit]
Description=Assistant Vocal Franco-Québécois
After=network.target sound.target

[Service]
Type=simple
User=gabriel
WorkingDirectory=/home/gabriel/assistant-publish
ExecStart=/home/gabriel/.dotnet/dotnet /home/gabriel/assistant-publish/AudioAssistant.dll
Restart=always
RestartSec=5
Environment=DOTNET_ROOT=/home/gabriel/.dotnet
Environment=HOME=/home/gabriel

[Install]
WantedBy=multi-user.target

After=sound.target s’assure que le sous-système audio est prêt avant que l’assistant démarre. Sans ça, Piper peut échouer au premier boot.

Étape 3.3 : Activer et démarrer

sudo systemctl daemon-reload
sudo systemctl enable assistant
sudo systemctl start assistant
sudo systemctl status assistant

# Logs en temps réel
journalctl -u assistant -f

Workflow de mise à jour

cd ~/projects/AudioAssistant
dotnet publish -c Release -r linux-arm64 --self-contained false -o ~/assistant-publish
sudo systemctl restart assistant

À partir de là, l’assistant tourne tout seul au boot, retient le fil de la conversation et coupe l’enregistrement dès que vous arrêtez de parler. Plus besoin de dotnet run à la main.