Towards robust speech acquisition using sensor arrays
Maganti, Hari Krishna
FakultätenFakultät für Ingenieurwissenschaften und Informatik
LizenzStandard (Fassung vom 03.05.2003)
An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.
Erstellung / Fertigstellung
Normierte SchlagwörterSprachverarbeitung [GND]
Speech perception [LCSH]
Speech processing systems [LCSH]
Tracking (Engineering) [LCSH]
SchlagwörterAudio visual sensors; Speech acquisition; Speech enhancement; Speech non-speech separation
DDC-SachgruppeDDC 620 / Engineering & allied operations
Das könnte Sie auch interessieren:
Untersuchung neuronaler Korrelate der Sprechmotorik mit der funktionellen Magnetresonanztomographie (fMRT) bei Patienten mit idiopathischem Parkinsonsyndrom DissertationSperling, JuliaIn der aktuellen fMRT-Studie untersuchten wir die neuronalen Korrelate der Sprechmotorik bei Patienten mit idiopathischem Parkinsonsyndrom unter laufender dopaminerger Medikation im Vergleich zu gesunden Kontrollprobanden. ...
KonferenzveröffentlichungBako, Boto Zsolt; Könings, Bastian; Schaub, Florian; Wiedersheim, Björn; Weber, MichaelResearch in Media Informatics is highly diverse. Ubiquitous Computing, Computer Graphics, Usability, Social Interactions, Social Networks, Human Computer Interaction, Privacy, Software Credibility, and Trustworthy Computing ...
DissertationZablotskiy, SergeyThis thesis outlines novel approaches for improving Russian large vocabulary continuous speech recognition. There are several peculiarities of Russian, which cause serious challenges for speech recognition process. The ...