A comprehensive voice dataset for Hindko digit recognition.

Tanveer Ahmed Maqbool Khan Khalil Khan Ikram Syed Syed Sajid Ullah

Data Brief

Department of Information & Communication Technology, University of Agder (UiA), Norway.

Published: February 2025

Hindko is a language primarily spoken in Northwestern areas of Pakistan. Approximately eight million people speak the Hindko language. According to its native speakers, it is 7 largest language of Pakistan and 2 largest language of Khyber Pakhtunkhwa. The Hazara region is the cultural hub of Hindko language. About 80% of the population in districts like Haripur, Abbotabad and Mansehra speak Hindko. The spoken content of Hindko covers a wide range of subjects, including religion, education, poetry, politics, theater, and more. Despite all this, Hindko lacks a voice recognition system that could enhance accessibility, preserve the language, and promote digital inclusion for its speakers. This paper presents a voice recognition dataset that consists of 17,597 voice samples, and is accessible to the public for academic and research purposes. The dataset consists of 20 Hindko digits ranging from 1 to 20 and all the voice samples are taken from the students and staff and faculty of Pak-Austria Fachhochschule Institute of Applied Science and Technology.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730949	PMC
http://dx.doi.org/10.1016/j.dib.2024.111220	DOI Listing

Publication Analysis

Top Keywords

hindko language

hindko

speak hindko

largest language

voice recognition

dataset consists

voice samples

language

comprehensive voice

voice dataset

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!