Research Division
Active Research
Six directions we are pursuing to move beyond the chat box. Each area includes technical documentation, current results, and open questions.
Voice Identity Preservation in Neural Translation
Preserving speaker characteristics across language boundaries
Most neural translation systems optimize for semantic accuracy and discard paralinguistic features—accent, prosody, rhythm, and vocal identity. We are researching how to preserve these characteristics when translating speech across languages, enabling real-time communication that sounds like the speaker, not a generic synthetic voice.
Implicit Turn-Taking in Conversational AI
Inferring speech readiness without explicit wake words
Wake words and push-to-talk create friction in human-AI interaction. Humans detect turn-taking cues from prosody, breathing patterns, and conversational context. We are building models that predict when a user has finished speaking or is inviting AI contribution, enabling truly conversational interfaces.
Real-Time Fact Correction in Generated Speech
Detecting and correcting factual errors without breaking flow
Large language models hallucinate. In spoken dialogue, the cost of an error is high and the window for correction is short. We are researching how to detect likely factual errors in real-time generated speech and issue corrections that feel like natural repair sequences rather than interruptions.
Multimodal Room Reading for Contextual Response
Integrating facial expression, gesture, and vocal tone
Current AI systems process text or speech but ignore the rich contextual signals humans use: facial expressions, micro-gestures, gaze direction, and vocal tone. We are researching how to integrate these modalities to infer emotional state, engagement level, and conversational context, using them to shape more appropriate AI responses.
Parallel Context Execution for Multi-Task AI
Independent time-anchored instruction streams
Most conversational AI handles one thing at a time. Human cognition maintains multiple parallel threads: tracking background tasks, monitoring ongoing processes, and handling interruptions. We are formalizing a model where AI can maintain multiple independent context threads, executing them in parallel without cross-contamination.
Conversational Computer Use and Task Execution
Bridging natural language dialogue with system action
The gap between conversational AI and computer use is large: one answers questions, the other performs actions. We are researching how to bridge this gap—enabling AI to understand natural language task descriptions, formulate multi-step plans, execute actions on live systems, and handle failures through dialogue.
Research conducted at Sylica AI Labs. For collaboration inquiries, contact our research team.