A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

ENTRANT: A Large Financial Dataset for Table Understanding. | LitMetric

Tabular data is a way to structure, organize, and present information conveniently and effectively. Real-world tables present data in two dimensions by arranging cells in matrices that summarize information and facilitate side-by-side comparisons. Recent research efforts aim to train large models to understand structured tables, a process that enables knowledge transfer in various downstream tasks. Model pre-training, though, requires large datasets, conveniently formatted to reflect cell and table characteristics. This paper presents ENTRANT, a financial dataset that comprises millions of tables, which are transformed to reflect cell attributes, as well as positional and hierarchical information. Hence, they facilitate, among other things, pre-training tasks for table understanding with deep learning methods. The dataset provides table and cell information along with the corresponding metadata in a machine-readable format. We have automated all data processing and curation and technically validated the dataset through unit testing of high code coverage. Finally, we demonstrate the use of the dataset in a pre-training task of a state-of-the-art model, which we use for downstream cell classification.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11322654PMC
http://dx.doi.org/10.1038/s41597-024-03605-5DOI Listing

Publication Analysis

Top Keywords

financial dataset
8
dataset table
8
table understanding
8
reflect cell
8
dataset
5
entrant large
4
large financial
4
table
4
understanding tabular
4
tabular data
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!