Development of Language Models for Continuous Uzbek Speech Recognition System.

Abdinabi Mukhamadiyev Mukhriddin Mukhiddinov Ilyos Khujayarov Mannon Ochilov Jinsoo Cho

Sensors (Basel)

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Published: January 2023

Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919949	PMC
http://dx.doi.org/10.3390/s23031145	DOI Listing

Publication Analysis

Top Keywords

language models

uzbek speech

language model

uzbek language

language

speech recognition

neural-network-based language

uzbek

models

development language

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered