Towards robust speech acquisition using sensor arrays
Maganti, Hari Krishna
FacultiesFakultät für Ingenieurwissenschaften und Informatik
LicenseStandard (Fassung vom 03.05.2003)
An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.
Subject HeadingsSprachverarbeitung [GND]
Speech perception [LCSH]
Speech processing systems [LCSH]
Tracking (Engineering) [LCSH]
KeywordsAudio visual sensors; Speech acquisition; Speech enhancement; Speech non-speech separation
Dewey Decimal GroupDDC 620 / Engineering & allied operations
MetadataShow full item record
This could also interest you:
Wissenschaftlicher ArtikelOswald, Franz; Kloeble, Patricia; Ruland, Andre; Rosenkranz, David; Hinz, Bastian; Butter, Falk; Ramljak, Sanja; Zechner, Ulrich; Herlyn, Holger (Universität Ulm, 2017)
Untersuchung neuronaler Korrelate der Sprechmotorik mit der funktionellen Magnetresonanztomographie (fMRT) bei Patienten mit idiopathischem Parkinsonsyndrom DissertationSperling, JuliaIn der aktuellen fMRT-Studie untersuchten wir die neuronalen Korrelate der Sprechmotorik bei Patienten mit idiopathischem Parkinsonsyndrom unter laufender dopaminerger Medikation im Vergleich zu gesunden Kontrollprobanden. ...
KonferenzveröffentlichungBako, Boto Zsolt; Könings, Bastian; Schaub, Florian; Wiedersheim, Björn; Weber, MichaelResearch in Media Informatics is highly diverse. Ubiquitous Computing, Computer Graphics, Usability, Social Interactions, Social Networks, Human Computer Interaction, Privacy, Software Credibility, and Trustworthy Computing ...