Abstract: Speech enhancement aims to enhance the perceived speech quality and intelligibility in the presence of noise. Classical speech enhancement methods are mainly based on audio only processing which often performs poorly in adverse conditions, where overwhelming noise is present. This paper presents an interactive prototype demo, as part of a disruptive cognitively inspired multimodal hearing-aid being researched and developed at Stirling, as part of an EPSRC funded project (COGAVHEAR). The proposed technology contextually utilizes and integrates multimodal cues such as lip-reading, facial expressions, gestures, and noisy audio, to further enhance the quality and intelligibility of the noise-filtered speech signal. However, the preliminary work presented in this paper has used only lip-reading and noisy audio. Lip-reading driven deep learning algorithms are exploited to learn noisy audio-visual to clean audio mappings, leading to enhanced Weiner filtering for more effective noise cancellation. The term context-aware signifies the device’s learning and adaptable capabilities, which could be exploited in a wide-range of real-world applications, ranging from hearing aids, listening devices, cochlear implants and telecommunications, to need for ear defenders in extremely noisy environments. Hearing-impaired users could experience more intelligible speech by contextually learning and switching between audio and visual cues. The preliminary interactive Demo employs randomly selected, real noisy speech videos from YouTube to qualitatively benchmark the performance of the proposed contextual audio-visual approach against a state-of-the-art deep learning based audio-only speech enhancement method.

Pdf: http://mandargogate.github.io/papers/CHAT2017-Lip-Reading-Driven-Hearing-Aids.pdf


  title={Towards Next-Generation Lip-Reading Driven Hearing-Aids: A preliminary Prototype Demo},
  author={Adeel, Ahsan and Gogate, Mandar and Hussain, Amir}