Developing Deep Learning Optical Character Recognition is an active area of research, where models based on deep neural networks are trained on data to eventually extract text within an image. Even though many advances are currently being made in this area in general, the Arabic OCR domain notably lacks a dataset for ancient manuscripts. Here, we fill this gap by providing both the image and textual ground truth for a collection of ancient Arabic manuscripts. This scarce dataset is collected from the central library of the Islamic University of Madinah, and it encompasses rich text spanning different geographies across centuries. Specifically, eight ancient books with a total of forty pages, both images and text, transcribed by the experts, are present in this dataset. Particularly, this dataset holds a significant value due to the unavailability of such data publicly, which conspicuously contributes to the deep learning models development/augmenting, validation, testing, and generalization by researchers and practitioners, both for the tasks of Arabic OCR and Arabic text correction.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381460 | PMC |
http://dx.doi.org/10.1016/j.dib.2024.110813 | DOI Listing |
Sci Data
January 2025
Marine Biotechnology Fish Nutrition and Health Division, Central Marine Fisheries Research Institute, Post Box No 1603 Ernakulam North PO., Kochi, 682018, Kerala, India.
Mussels, particularly Perna viridis, are vital sentinel species for toxicology and biomonitoring in environmental health. This species plays a crucial role in aquaculture and significantly impacts the fisheries sector. Despite the ecological and economic importance of this species, its omics resources are still scarce.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Department of Electrical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates.
Accurately identifying and discriminating between different brain states is a major emphasis of functional brain imaging research. Various machine learning techniques play an important role in this regard. However, when working with a small number of study participants, the lack of sufficient data and achieving meaningful classification results remain a challenge.
View Article and Find Full Text PDFJ Imaging
January 2025
School of Artificial Intelligence, Changchun University of Science and Technology, Changchun 130012, China.
For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of occlusion and low-resolution person identification, this paper proposes a new face recognition framework by reconstructing Retinaface-Resnet and combining it with Quality-Adaptive Margin (adaface).
View Article and Find Full Text PDFBioengineering (Basel)
December 2024
College of Liberal Arts Faculty of Basic Liberal Art, Hansung University, Seoul 02876, Republic of Korea.
The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG).
View Article and Find Full Text PDFAvian Pathol
January 2025
Institute of Biotechnology, University of Caxias do Sul (UCS), Caxias do Sul, Rio Grande do Sul, Brazil, Rio Grande do Sul, Brazil.
serovar Gallinarum biovar Gallinarum is a pathogenic bacterium that causes fowl typhoid (FT), affecting chicken flocks worldwide. This study aimed to evaluate the emergence, dissemination and genomic profile of Gallinarum lineages from Brazil. Twelve whole-genomes sequences (WGS) of different .
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!