Towards robust speech acquisition using sensor arrays
Auch gedruckt in der BibliothekZ: J-H 11.374 ; W: W-H 9.483
Maganti, Hari Krishna
FakultätFakultät für Ingenieurwissenschaften und Informatik
Ressourcen- / MedientypDissertation, Text
Datum der Freischaltung2007-02-01
An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.
LizenzStandard (Fassung vom 03.05.2003)
Speech processing systems
Freie SchlagwörterAudio visual sensors
Speech non-speech separation
DDC-SachgruppeDDC 620 / Engineering & allied operations
Das könnte Sie auch interessieren:
Untersuchung neuronaler Korrelate der Sprechmotorik mit der funktionellen Magnetresonanztomographie (fMRT) bei Patienten mit idiopathischem Parkinsonsyndrom Sperling, JuliaIn der aktuellen fMRT-Studie untersuchten wir die neuronalen Korrelate der Sprechmotorik bei Patienten mit idiopathischem Parkinsonsyndrom unter laufender dopaminerger Medikation im Vergleich zu gesunden Kontrollprobanden. ...
Bako, Boto Zsolt; Könings, Bastian; Schaub, Florian; Wiedersheim, Björn; Weber, MichaelResearch in Media Informatics is highly diverse. Ubiquitous Computing, Computer Graphics, Usability, Social Interactions, Social Networks, Human Computer Interaction, Privacy, Software Credibility, and Trustworthy Computing ...
Hofmann, HansjörgSmartphones are considered as people’s companions and help users to get instant access to the Internet anytime and anywhere. However, the manual use of smartphones is only appropriate in situations, where the actual ...