Information extraction from historical well records using a large language model.

Sci Rep

Earth & Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA.

Published: December 2024

To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-024-81846-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685759PMC

Publication Analysis

Top Keywords

well records
8
large language
8
orphaned wells
8
wells
6
extraction historical
4
well
4
historical well
4
records large
4
language model
4
model reduce
4

Similar Publications

Background: Geraniums (Pelargonium) are among the most popular flowers worldwide. Viral infection is one of the main problems of the genus Pelargonium, and the production of virus-free mother plants is necessary for large-scale geranium propagation and exchange. Meristem culture and thermotherapy are two effective procedures that have been widely adopted to produce healthy virus-free plant stocks.

View Article and Find Full Text PDF

Purposes: We analyzed the acute-phase response in unilateral thyroidectomy by comparing the transoral endoscopic thyroidectomy vestibular approach (TOEVA) with the minimally invasive video-assisted thyroidectomy (MIVAT).

Methods: Patients were randomly assigned to undergo either TOEVA or MIVAT, after we obtained their written informed consent to participate in this study. Blood count, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), interleukin-1β (IL-1β), IL-6 and tumor necrosis factor (TNF-) were measured before surgery and then 4, 24, and 48 h after surgery.

View Article and Find Full Text PDF

Objective: To increase the number of episodes of vitamin D teaching in the primary care setting for parents of human milk-fed infants and to explore pediatric clinicians' knowledge of vitamin D supplementation in human milk-fed infants and their perception of project intervention usefulness.

Design: Quality improvement project using a quasi-experimental, pretest-posttest design.

Setting/local Problem: Despite recommendations from the American Academy of Pediatrics, vitamin D supplementation adherence rates for human milk-fed infants remain low.

View Article and Find Full Text PDF

The effects of diazepam on sleep depend on the photoperiod.

Acta Pharmacol Sin

January 2025

Laboratory for Neurophysiology, Department of Cell and Chemical Biology, Leiden University, Medical Centre, Leiden, 2333, ZC, The Netherlands.

Daylength (i.e., photoperiod) provides essential information for seasonal adaptations of organisms.

View Article and Find Full Text PDF

The effect of work content on workload, stress, and performance was not well addressed in the literature, due to the lack of comprehensive conceptualization, problem definition, and relevant dataset. The gap between laboratory-simulated studies and real-life working conditions delays the generalization, hindering the development of performance management and monitoring tools. Contributing to this topic, a data collection effort is organized, which considers unique work conditions and work content factors of a coffee shop, to conceptualize scenarios that better highlight their effect on human performance, thus creating the Work content Effect on BAristas (WEBA) dataset.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!