Despite significant research on the Bangla language in Natural Language Processing (NLP), there remains a notable resource deficit for its diverse regional dialects, such as those spoken in Chittagong, Sylhet, and Barisal. These dialects, often considered unintelligible to speakers of Standard Bengali, pose challenges due to their unique grammatical structures and phonetic variations. Some linguists categorize them as distinct languages. To address this, we present ONUBAD, a large and freely available dataset for the automatic translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla using a Neural Machine Translation (NMT) system. ONUBAD provides a parallel corpus of 1540 words, 130 clauses, and 980 sentences per regional dialect and their standard counterparts along with English translation. The dataset includes metadata on phonetic variations and grammatical features, aiming to bridge the gap between standard and non-standard forms of Bangla. It serves as a valuable resource for researchers in NLP, dialect studies, and linguistic preservation, helping to develop more accurate and contextually relevant translation models. The dataset was collected between July and September 2024 from diverse sources such as books, websites, and regional people with the help of regional dialect specialists. It is hosted by the Department of Computer Science and Engineering, Jahangirnagar University, and is freely accessible at https://data.mendeley.com/datasets/6ft99kf89b/2.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787450PMC
http://dx.doi.org/10.1016/j.dib.2025.111276DOI Listing

Publication Analysis

Top Keywords

regional dialects
8
dialects standard
8
standard bengali
8
chittagong sylhet
8
sylhet barisal
8
barisal dialects
8
phonetic variations
8
regional dialect
8
regional
5
standard
5

Similar Publications

Research on brain plasticity, particularly in the context of deafness, consistently emphasizes the reorganization of the auditory cortex. But to what extent do all individuals with deafness show the same level of reorganization? To address this question, we examined the individual differences in functional connectivity (FC) from the deprived auditory cortex. Our findings demonstrate remarkable differentiation between individuals deriving from the absence of shared auditory experiences, resulting in heightened FC variability among deaf individuals, compared to more consistent FC in the hearing group.

View Article and Find Full Text PDF

Introduction: There is a move towards engaging people with lived experience and families (PWLE/F)-also referred to as PWLE/F engagement-in mental health and/or substance use research. However, PWLE/F engagement is inadequately reported on in mental health and/or substance use research papers.

Objective: To understand what PWLE/F and researchers perceive are important components to report on related to engagement in mental health and/or substance use research.

View Article and Find Full Text PDF

Background: Cognitive dysfunction after traumatic brain injury (TBI) significantly reduces quality of life and imposes a heavy burden on society. A detailed examination of research trends of cognitive dysfunction following TBI has not yet been conducted. This study aimed to examine the bibliometric analysis of cognitive dysfunction after traumatic brain injury over the past 20 years.

View Article and Find Full Text PDF

Multicultural Amazonian populations in remote areas of French Guiana face challenges in accessing healthcare and preventive measures. They are geographically and administratively isolated. Health mediation serves as an interface between vulnerable people and the professionals involved in their care.

View Article and Find Full Text PDF

Introduction: In hematopoietic stem cell transplantation, optimal results are achieved when donors and patients are matched regarding their human leukocyte antigen (HLA) genes. Population-specific HLA allele and haplotype frequency distributions determine the probabilities to find matched donors in a stem cell donor registry of given size and ethnic composition.

Methods: To evaluate the needs of Indian patients with regard to future donor recruitment, we analyzed a large data set of =130,518 potential stem cell donors registered with DKMS-BMST, a Bangalore-based donor registry with nationwide donor recruitment activities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!