In the pouring task of service robots, the robust and accurate estimate of liquid height is a crucial step. However, neither vision nor audio alone can achieve better liquid height estimation. We instead propose a visual-audio information fusion network to enable robots with good pouring skills. Visual and audio information are used as information sources. Firstly, visual features are extracted by residual network based on attention model. Secondly, the Fourier characteristic matrix of audio information is obtained by fast Fourier transform, and then the audio feature is extracted by long-short term memory. Thirdly, visual features and audio features are fused by fully connected network to output the liquid height and state of the cup. Finally, a sinusoidal and transient fusion control method is proposed, which takes the liquid height and cup state as inputs, outputs the angle of the gripper, and provides an implementation method for the pouring task. Experiments are carried out to evaluate the performance of multimodal information fusion method and verify the effectiveness of the algorithm for pouring tasks of service robots.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.isatra.2022.09.022 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!