Automatic Detection and Extraction of Key Resources from Tables in Biomedical Papers.

bioRxiv

FDI Lab, University of California, San Diego, 9500 Gilman Drive, M/C 0608, La Jolla, CA 92093-0608, USA.

Published: October 2024

Tables are useful information artifacts that allow easy detection of data "missingness" by humans and have been deployed by several publishers to improve the amount of information present for key resources and reagents such as antibodies, cell lines, and other tools that constitute the inputs to a study. The STAR*Methods tables, specifically, have increased the "findability" of these key resources, but they have not been commonly available outside of the Cell Press journal family. To improve the availability of these tables in the broader biomedical literature, we have attempted to automatically process BioRxiv preprints to create tables from text or to recognize tables already created by authors and structure them for later use by publishers and search systems, to improve "findability" of resources in a larger amount of the scientific literature. The extraction of key resource tables in PDF files by the best in class tools resulted in Grid Table Similarity (GriTS) score of 0.12, so we have created several multimodal pipelines employing machine learning approaches for key resource table page identification, Table Transformer models for table detection and table structure recognition and a new table-specific language model for row over-segmentation to improve the extraction of text in tables created by biomedical authors and published on BioRxiv to around GriTS score of 0.90 enabling the deployment of automated research resource extraction tools onto BioRxiv.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11507667PMC
http://dx.doi.org/10.1101/2024.10.15.618379DOI Listing

Publication Analysis

Top Keywords

key resources
12
extraction key
8
tables
8
tables created
8
key resource
8
grits score
8
key
5
table
5
automatic detection
4
extraction
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!