A comprehensive voice dataset for Hindko digit recognition.

Data Brief

Department of Information & Communication Technology, University of Agder (UiA), Norway.

Published: February 2025

Hindko is a language primarily spoken in Northwestern areas of Pakistan. Approximately eight million people speak the Hindko language. According to its native speakers, it is 7 largest language of Pakistan and 2 largest language of Khyber Pakhtunkhwa. The Hazara region is the cultural hub of Hindko language. About 80% of the population in districts like Haripur, Abbotabad and Mansehra speak Hindko. The spoken content of Hindko covers a wide range of subjects, including religion, education, poetry, politics, theater, and more. Despite all this, Hindko lacks a voice recognition system that could enhance accessibility, preserve the language, and promote digital inclusion for its speakers. This paper presents a voice recognition dataset that consists of 17,597 voice samples, and is accessible to the public for academic and research purposes. The dataset consists of 20 Hindko digits ranging from 1 to 20 and all the voice samples are taken from the students and staff and faculty of Pak-Austria Fachhochschule Institute of Applied Science and Technology.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730949PMC
http://dx.doi.org/10.1016/j.dib.2024.111220DOI Listing

Publication Analysis

Top Keywords

hindko language
12
hindko
8
speak hindko
8
largest language
8
voice recognition
8
dataset consists
8
voice samples
8
language
6
comprehensive voice
4
voice dataset
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!