Show simple item record

AuthorMaganti, Hari Krishnadc.contributor.author
Date of accession2016-03-14T13:38:47Zdc.date.accessioned
Available in OPARU since2016-03-14T13:38:47Zdc.date.available
Year of creation2006dc.date.created
AbstractAn integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.dc.description.abstract
Languageendc.language.iso
PublisherUniversität Ulmdc.publisher
LicenseStandard (Fassung vom 03.05.2003)dc.rights
Link to license texthttps://oparu.uni-ulm.de/xmlui/license_v1dc.rights.uri
KeywordAudio visual sensorsdc.subject
KeywordSpeech acquisitiondc.subject
KeywordSpeech enhancementdc.subject
KeywordSpeech non-speech separationdc.subject
Dewey Decimal GroupDDC 620 / Engineering & allied operationsdc.subject.ddc
LCSHSpeech perceptiondc.subject.lcsh
LCSHSpeech processing systemsdc.subject.lcsh
LCSHTracking (Engineering)dc.subject.lcsh
TitleTowards robust speech acquisition using sensor arraysdc.title
Resource typeDissertationdc.type
DOIhttp://dx.doi.org/10.18725/OPARU-370dc.identifier.doi
URNhttp://nbn-resolving.de/urn:nbn:de:bsz:289-vts-58371dc.identifier.urn
GNDSprachverarbeitungdc.subject.gnd
FacultyFakultät für Ingenieurwissenschaften und Informatikuulm.affiliationGeneral
Date of activation2007-02-01T22:58:29Zuulm.freischaltungVTS
Peer reviewneinuulm.peerReview
Shelfmark print versionZ: J-H 11.374 ; W: W-H 9.483uulm.shelfmark
DCMI TypeTextuulm.typeDCMI
VTS-ID5837uulm.vtsID
CategoryPublikationenuulm.category


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record