ASR core technology of telephone robots

What is ASR

Speech recognition technology, also known as Automatic Speech Recognition (Automatic Speech Recognition), aims to convert vocabulary content in human speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Unlike speaker recognition and speaker verification, the latter attempts to identify or confirm the speaker who made the speech rather than the content of the vocabulary contained therein.

35.jpg

Domestic development

The study of speech recognition in China started in the fifties, but it has developed rapidly in recent years. The level of research has also gradually moved from the laboratory to practical use. Since the implementation of the National 863 Program in 1987, the National 863 Intelligent Computer Expert Group has set up a special project for speech recognition technology, rolling once every two years. The research level of China's speech recognition technology has been basically synchronized with foreign countries, and it has its own characteristics and advantages in Chinese speech recognition technology, and has reached the international advanced level. Institutes of Automation of the Chinese Academy of Sciences, Institute of Acoustics, Tsinghua University, Peking University, Harbin Institute of Technology, Shanghai Jiaotong University, China University of Science and Technology, Beijing University of Posts and Telecommunications, and Huazhong University of Science and Technology have all conducted research on speech recognition in laboratories. The research units are the National Key Laboratory for Pattern Recognition at the Department of Electronic Engineering, Tsinghua University and Institute of Automation, Chinese Academy of Sciences.

Department of Electronic Engineering, Tsinghua University, Department of Speech Technology and Dedicated Chip Design. The recognition accuracy of the non-specific Chinese digit string continuous speech recognition system developed by the University of Tsinghua University is 94.8% (indefinite-length digit string) and 96.8% (fixed-length digit string). With a rejection rate of 5%, the system identification rate can reach 96.9% (indefinite-length digit string) and 98.7% (fixed-length digit string). This is one of the best international recognition results, and its performance is close to that. Practical level. The recognition rate of the 5,000-word postal packet verification non-specific person continuous speech recognition system reached 98.73%, and the recognition rate of the first three elections was 99.96%. The two languages ​​of Putonghua and Sichuan dialect could be recognized to meet practical requirements.

36.jpg

Three parts of speech recognition system

Speech signal preprocessing and feature extraction: A fundamental problem of speech recognition is the rational selection of features. The purpose of the feature parameter extraction is to analyze and process the speech signal, remove the redundant information that has nothing to do with the speech recognition, and obtain the important information that affects the speech recognition, and simultaneously compress the speech signal.

Acoustic Model and Pattern Matching: Acoustic models are usually generated by training acquired speech features using training algorithms. In the recognition, the input speech features are matched and compared with the acoustic model (pattern) to obtain the best recognition result.

Language model and language processing: The language model includes a grammar network composed of recognized speech commands or a language model composed of statistical methods, and language processing can perform grammatical and semantic analysis.

Solar Panel Supply

Shenzhen Zhifu New Energy Co., Ltd. , https://www.sunbeambattery.com