Objectives: To determine inter-lab reliability in sleep stage scoring using the 2014 American Academy of Sleep Medicine (AASM) manual. To understand in-depth reasons for disagreement and provide suggestions for improvement.
Methods: This study consisted of 40 all-night polysomnographys (PSGs) from different samples. PSGs were segmented into 37,642 30-s epochs. Five doctors from China and two doctors from America scored the epochs following the 2014 AASM standard. Scoring disagreement between two centers was evaluated using Cohen's kappa (κ). After visual inspection of PSGs of deviating scorings, potential disagreement reasons were analyzed.
Results: Inter-lab reliability yielded a substantial degree (κ = 0.75 ± 0.01). Scoring for stage W (κ = 0.89) and R (κ = 0.87) achieved the highest agreement, while stage N1 (κ = 0.45) reflected the lowest. Considering the relative disagreement ratio, N2-N3 (22.09%), W-N1 (19.68%), and N1-N2 (18.75%) were the most frequent combinations of discrepancy. American and Chinese doctors showed certain characteristics in the scoring of discrepancy combination W-N1, N1-N2, and N2-N3. There are seven reasons for disagreement, namely "on-threshold characteristic" (29.21%), "context influence" (18.06%), "characteristic identification difficulty" (8.81%), "arousal-wake confusion" (7.57%), "derivation inconsistence" (2.15%), "on-borderline characteristic" (0.92%), and "misrecognition" (33.27%).
Conclusions: This study demonstrated the sleep stage scoring agreement of the 2014 AASM manual and explored potential sources of labeling ambiguity. Improvement measures were suggested accordingly to help remove ambiguity for scorers and improve scoring reliability at the international level.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s11325-019-01801-x | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!