Deep neural networks have been proven effective in classifying human interactions into emotions, especially by encoding multiple input modalities. In this work, we assess the robustness of a transformer-based multimodal audio-text classifier for emotion recognition, by perturbing the input at inference time using attacks which we design specifically to corrupt information deemed important for emotion recognition. To measure the impact of the attacks on the classifier, we compare between the accuracy of the classifier on the perturbed input and on the original, unperturbed input.
View Article and Find Full Text PDFChildhood sexual abuse (CSA) is a worldwide phenomenon that has negative long-term consequences for the victims and their families, and inflicts a considerable economic toll on society. One of the main difficulties in treating CSA is victims' reluctance to disclose their abuse, and the failure of professionals to detect it when there is no forensic evidence (Bottoms et al., 2014; McElvaney, 2013).
View Article and Find Full Text PDF