Skip to main content

Playground

A cutting-edge MoE model achieving SOTA performance across text, image, audio, and video simultaneously. It uses a Thinker–Talker architecture for low-latency, real-time, streaming responses.

Key Features

  • Natively Omni-Modal: Unifies processing of text, image, audio, and video, ensuring high performance across all modalities.
  • Real-Time Speed: Features ultra-low latency streaming and natural speech output, enabling fluent audio-visual dialogue.
  • SOTA Audio: Achieves state-of-the-art results in audio benchmarks, excelling at speech recognition and sound analysis.
  • Flexible Control: Supports customization via system prompts and function calling for seamless integration with external tools.