Recent object detection models show promising advances in their architecture and performance, expanding potential applications for the benefit of persons with blindness or low vision (pBLV). However, object detection models are usually trained on generic data rather than datasets that focus on the needs of pBLV. Hence, for applications that locate objects of interest to pBLV, object detection models need to be trained specifically for this purpose. Informed by prior interviews, questionnaires, and Microsoft's ORBIT research, we identified thirty-five objects pertinent to pBLV. We employed this user-centric feedback to gather images of these objects from the Google Open Images V6 dataset. We subsequently trained a YOLOv5x model with this dataset to recognize these objects of interest. We demonstrate that the model can identify objects that previous generic models could not, such as those related to tasks of daily functioning - e.g., coffee mug, knife, fork, and glass. Crucially, we show that careful pruning of a dataset with severe class imbalances leads to a rapid, noticeable improvement in the overall performance of the model by two-fold, as measured using the mean average precision at the intersection over union thresholds from 0.5 to 0.95 (mAP50-95). Specifically, mAP50-95 improved from 0.14 to 0.36 on the seven least prevalent classes in the training dataset. Overall, we show that careful curation of training data can improve training speed and object detection outcomes. We show clear directions on effectively customizing training data to create models that focus on the desires and needs of pBLV.Clinical Relevance- This work demonstrated the benefits of developing assistive AI technology customized to individual users or the wider BLV community.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/EMBC40787.2023.10340454 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!