R03Pilot system

Real-Time Fact Correction in Generated Speech

Detecting and correcting factual errors without breaking flow

factualityragcorrectionlatency

Abstract

Large language models hallucinate. In spoken dialogue, the cost of an error is high and the window for correction is short. We are researching how to detect likely factual errors in real-time generated speech and issue corrections that feel like natural repair sequences rather than interruptions.

Problem Statement

When AI speaks a factual error, the standard approaches are: (1) ignore it, damaging trust; (2) stop and correct explicitly, breaking conversational flow; or (3) prevent errors through heavy retrieval augmentation, adding latency. We want a fourth option: detect, correct, and continue—like a human would.

Approach

We use a speculative execution model. As the LLM generates, a fact-checking module queries a knowledge base in parallel with token generation. High-latency facts are predicted heuristically. When a likely error is detected mid-utterance, the system inserts a repair sequence ('actually...', 'correction...') rather than aborting.

Speculative fact checking

The system maintains a buffer of the last N generated tokens. For each entity mention, it initiates a knowledge base query in parallel with continued generation. Entity linking runs on partial text using fast heuristics. Queries are batched and cached aggressively.

Confidence modeling

Not all errors need correction. We model correction necessity as a function of: error severity (factual vs nuanced), user expertise (novice vs expert), and conversational context (instructional vs casual). The model learns from human feedback which errors warrant interruption.

Repair strategies

Human dialogue uses specific repair patterns: same-turn self-correction ('I mean...'), next-turn other-correction, and embedded corrections. We implement these as learned strategies, generating the repair in the same prosodic contour as the surrounding speech to minimize disruption.

Latency engineering

The critical path is entity linking and knowledge retrieval. We use approximate nearest neighbor search over entity embeddings (10ms), speculative caching of likely follow-up facts, and tiered verification (fast check vs deep check). Current end-to-end detection latency is 240ms.

Evaluation challenges

Standard NLP benchmarks do not capture the conversational repair dynamic. We use simulated dialogue where confederates introduce errors and measure: detection rate, correction acceptance, flow disruption (measured by user turn latency), and trust restoration. Human evaluators rate correction naturalness on a 5-point scale.