Beware the Grizzlyman: A comparison of job- and industry-based noise exposure estimates using manual coding and the NIOSH NIOCCS machine learning algorithm.

Benjamin Roberts Abas Shkembi Lauren M Smith Richard L Neitzel

J Occup Environ Hyg

Department of Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, Michigan.

Published: July 2022

The NIOSH updated its coding system (NIOCCS) that uses machine learning to assign codes to industries and jobs based on free-text inputs, but it hasn't been tested for quality with varied input types.
This study tested NIOCCS's robustness by analyzing over 700,000 noise measurements, comparing different levels of input refinement, and assessing how that affected job-exposure estimates for noise.
Results showed that less refined inputs led to higher misclassification rates, with refined titles producing more accurate noise exposure estimates, indicating that the quality of input data is crucial for effective machine learning classification.

Recently, the National Institute for Occupational Safety and Health (NIOSH) released an updated version of the NIOSH Industry and Occupation Computerized Coding System (NIOCCS), which uses supervised machine learning to assign industry and occupational codes based on provided free-text information. However, no efforts have been made to externally verify the quality of assigned industry and job titles when the algorithm is provided with inputs of varying quality. This study sought to evaluate whether the NIOCCS algorithm was sufficiently robust with low-quality inputs and how variable quality could impact subsequent job estimated exposures in a large job-exposure matrix for noise (NoiseJEM). Using free-text industry and job descriptions from >700,000 noise measurements in the NoiseJEM, three files were created and input into NIOCCS: (1) N1, "raw" industries and job titles; (2) N2, "refined" industries and "raw" job titles; and (3) N3, "refined" industries and job titles. Standardized industry and occupation codes were output by NIOCCS. Descriptive statistics of performance metrics (e.g., misclassification/discordance of occupation codes) were evaluated for each input relative to the original NoiseJEM dataset (N0). Across major Standardized Occupational Classifications (SOC), total discordance rates for N1, N2, and N3 compared to N0 were 53.6%, 42.3%, and 5.0%, respectively. The impact of discordance on the major SOC group varied and included both over- and under-estimates of average noise exposure compared to N0. N2 had the most accurate noise exposure estimates (i.e., smallest bias) across major SOC groups compared to N1 and N3. Further refinement of job titles in N3 showed little improvement. Some variation in classification efficacy was seen over time, particularly prior to 1985. Machine learning algorithms can systematically and consistently classify data but are highly dependent on the quality and amount of input data. The greatest benefit for an end-user may come from cleaning industry information before applying this method for job classification. Our results highlight the need for standardized classification methods that remain constant over time.

Download full-text PDF	Source
http://dx.doi.org/10.1080/15459624.2022.2076860	DOI Listing

Publication Analysis

Top Keywords

job titles

noise exposure

machine learning

exposure estimates

industry occupation

job

industry job

industries job

titles "refined"

"refined" industries

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered