A roadmap to artificial intelligence (AI): Methods for designing and building AI ready data to promote fairness.

Farah Kidwai-Khan Rixin Wang Melissa Skanderson Cynthia A Brandt Samah Fodeh Julie A Womack

J Biomed Inform

VA Connecticut Healthcare System, West Haven, CT, USA; Yale School of Nursing, New Haven, CT, USA.

Published: June 2024

Objectives: We evaluated methods for preparing electronic health record data to reduce bias before applying artificial intelligence (AI).

Methods: We created methods for transforming raw data into a data framework for applying machine learning and natural language processing techniques for predicting falls and fractures. Strategies such as inclusion and reporting for multiple races, mixed data sources such as outpatient, inpatient, structured codes, and unstructured notes, and addressing missingness were applied to raw data to promote a reduction in bias. The raw data was carefully curated using validated definitions to create data variables such as age, race, gender, and healthcare utilization. For the formation of these variables, clinical, statistical, and data expertise were used. The research team included a variety of experts with diverse professional and demographic backgrounds to include diverse perspectives.

Results: For the prediction of falls, information extracted from radiology reports was converted to a matrix for applying machine learning. The processing of the data resulted in an input of 5,377,673 reports to the machine learning algorithm, out of which 45,304 were flagged as positive and 5,332,369 as negative for falls. Processed data resulted in lower missingness and a better representation of race and diagnosis codes. For fractures, specialized algorithms extracted snippets of text around keywork "femoral" from dual x-ray absorptiometry (DXA) scans to identify femoral neck T-scores that are important for predicting fracture risk. The natural language processing algorithms yielded 98% accuracy and 2% error rate The methods to prepare data for input to artificial intelligence processes are reproducible and can be applied to other studies.

Conclusion: The life cycle of data from raw to analytic form includes data governance, cleaning, management, and analysis. When applying artificial intelligence methods, input data must be prepared optimally to reduce algorithmic bias, as biased output is harmful. Building AI-ready data frameworks that improve efficiency can contribute to transparency and reproducibility. The roadmap for the application of AI involves applying specialized techniques to input data, some of which are suggested here. This study highlights data curation aspects to be considered when preparing data for the application of artificial intelligence to reduce bias.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11144439	PMC
http://dx.doi.org/10.1016/j.jbi.2024.104654	DOI Listing

Publication Analysis

Top Keywords

artificial intelligence

data

raw data

machine learning

intelligence methods

data promote

reduce bias

applying artificial

applying machine

natural language

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!