Speech Interaction

What is the Speech Interaction?

Intelligent Speech Interaction is developed based on cutting-edge technologies, such as speech recognition, speech synthesis and natural language comprehension. Companies can integrate Intelligent Speech Interaction into their products to allow them to hear, understand and talk to users, providing users with an immersive human-computer interaction experience. Intelligent Speech Interaction is currently available in Mandarin, Cantonese, English, Japanese, Korean, French and Indonesian, and stay tuned for other languages.

Intelligent Speech Interaction is suitable for a variety of scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches and transcription of audio recordings. Intelligent Speech Interaction has been successfully applied in many industries, such as finance, insurance, e-commerce and smart home. Intelligent Speech Interaction allows you to use a self-learning platform to improve the accuracy of speech recognition and provides a comprehensive management console and easy-to-use SDKs. You are invited to enable Smart Speech Interaction.

Speech Interaction has several functions, for example:

Short sentence recognition: Recognizes short speech that lasts up to 1 minute. The service applies to short speech interaction scenarios, such as voice search, voice command control and short voice message. It can also be integrated with various mobile apps, smart home appliances and smart voice assistants.

Real-time speech recognition: Recognizes audio streams of various lengths in real time, which can achieve the effect of text output in speech. The integrated smart phrase break feature recognizes the start and end times of each sentence. Real-time speech recognition applies to scenarios such as creating real-time subtitles on live videos, recording real-time meetings, recording real-time judgments and intelligent voice assistants.

Recording file recognition: Recognizes the recording files you upload. This service applies to scenarios such as the quality assurance of call canters, recording of judgments in databases, summary of meeting minutes and filing of medical records.

Speech synthesis: Developed based on deep learning technology to convert text into fluent speech with natural sound. The service offers multiple speakers and allows you to adjust the speed, intonation and volume of the speech generated. Speech synthesis applies to scenarios such as intelligent customer service, speech interaction, reading audio books and accessible transmission.

Speech synthesis speaker customization: Based on deep learning technology, the speech synthesis speaker personalization service allows you to customize text-to-speech (TTS) speakers at a fast speed using a small amount of training data. You can use the custom speakers for speech synthesis on the Intelligent Speech Interaction console or on your smart device.

Self-learning platform: Provides hot word training and customized language models to help you improve speech recognition performance.