In our everyday environment, objects and events usually stimulate a variety of sensory modalities simultaneously. These are fused into a multisensory percept that allows us faster and more accurate perception of the world so that we act appropriately. Therefore, it is important to understand the conditions in which the brain integrates the different sensory information guiding our attention and the supporting mechanism.
The proposed project aims to determine how sound facilitates visual perception by comparing machine learning models for image recognition and human performance. More specifically, their ability to recognise ambiguous images in the presence or absence of sound information will be examined. They will be tested under different conditions: with a sound that is semantically related to the category of the image, with a semantically unrelated sound, with random noise or without any sound.
To start the project, a dataset consisting of images and audio stimuli will be prepared. Ambiguous images will be made by introducing a degree of visual noise. A human observer study will be conducted in which audio-image pairs will be presented to participants, who are then asked to indicate the category of the image. It is predicted that performance will be best when the accompanying audio stimuli is semantically related to the image category but will drop with unrelated sounds and drop further with random noise or no sound. Meanwhile, a machine image classifier will be adapted to utilise both audio and visual information for image classification. The classifier will rely on separate analyses of audio and visual information which are combined to generate a final prediction. The same performance pattern is expected from the machine model as in humans. Accuracy advantage over a no-audio condition will be compared with the different audio conditions between humans and machine model. If consistent, the machine model can be dissected to give insight into possible mechanisms by which sound facilitates visual perception. The results of the project will be reported in a written report and poster presentation. As the entire project relies on online materials (if necessary, the human observer component will be delivered online), any COVID restrictions would not affect it.
This project may have implications for health and wellbeing. Understanding how auditory information interacts with vision can help create more sophisticated prostheses to help visually impaired people enrich their experience of the world. Furthermore, whereas natural spaces have a positive impact on mental health, urban spaces tend to have a negative impact. Varying the perception of these spaces through information from a different sensory modality could modulate these effects. For example, listening to sound from nature may reduce the negative impact urban spaces on mental health. Finally, an understanding of how sound facilitates visual perception could be used to improve the multisensory experiences in virtual reality methods which are more frequently used in rehabilitation after stroke.