Zur Kurzanzeige

AutorMaganti, Hari Krishnadc.contributor.author
Aufnahmedatum2016-03-14T13:38:47Zdc.date.accessioned
In OPARU verfügbar seit2016-03-14T13:38:47Zdc.date.available
Jahr der Erstellung2006dc.date.created
ZusammenfassungAn integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.dc.description.abstract
Spracheendc.language.iso
Verbreitende StelleUniversität Ulmdc.publisher
LizenzStandard (Fassung vom 03.05.2003)dc.rights
Link zum Lizenztexthttps://oparu.uni-ulm.de/xmlui/license_v1dc.rights.uri
SchlagwortAudio visual sensorsdc.subject
SchlagwortSpeech acquisitiondc.subject
SchlagwortSpeech enhancementdc.subject
SchlagwortSpeech non-speech separationdc.subject
DDC-SachgruppeDDC 620 / Engineering & allied operationsdc.subject.ddc
LCSHSpeech perceptiondc.subject.lcsh
LCSHSpeech processing systemsdc.subject.lcsh
LCSHTracking (Engineering)dc.subject.lcsh
TitelTowards robust speech acquisition using sensor arraysdc.title
RessourcentypDissertationdc.type
DOIhttp://dx.doi.org/10.18725/OPARU-370dc.identifier.doi
URNhttp://nbn-resolving.de/urn:nbn:de:bsz:289-vts-58371dc.identifier.urn
GNDSprachverarbeitungdc.subject.gnd
FakultätFakultät für Ingenieurwissenschaften und Informatikuulm.affiliationGeneral
Datum der Freischaltung2007-02-01T22:58:29Zuulm.freischaltungVTS
Peer-Reviewneinuulm.peerReview
Signatur DruckexemplarZ: J-H 11.374 ; W: W-H 9.483uulm.shelfmark
DCMI MedientypTextuulm.typeDCMI
VTS-ID5837uulm.vtsID
KategoriePublikationenuulm.category


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige