Inter-Dataset Variability Modeling for Speaker Recognition

Dr. Hagai Aronowitz

Abstract

Speaker recognition in a mismatched domain was the main focus of the recent NIST speaker recognition evaluation. In my talk I will introduce a novel approach for addressing this challenge. The main principle is to learn the inter-dataset variability in the development data and generalize to unseen conditions. The first method described is to learn a subspace in the high-level feature space that is most sensitive to dataset mismatch and remove it. The second method described is to optimize the recognition model to directly minimize the error in scoring (log-likelihood ratios) of target trials when dataset-dependent models are replaced by a dataset-independent model. The result we obtain in the latter method is a correction term for the commonly estimated withinspeaker variability covariance matrix. The correction term is proportional to the normalized inter-dataset variability of the within-speaker variability covariance matrices.

H. Aronowitz, "Inter Dataset Variability Modeling for Speaker Recognition", in Proc. ICASSP, 2017.
H. Aronowitz, "Compensating Inter-Dataset Variability in PLDA Hyper-Parameters for Robust Speaker Recognition", in Proc. Speaker Odyssey, 2014.
H. Aronowitz,"Inter Dataset Variability Compensation for Speaker Recognition",in Proc. ICASSP, 2014.

Speaker

Photo of Dr. Hagai Aronowitz

Dr. Hagai Aronowitz received the B.Sc. degree in Computer Science, Mathematics and Physics from the Hebrew University, Jerusalem, Israel in 1994, and the M.Sc. degree, Summa Cum Laude and Ph.D. degree, both in Computer Science from Bar-Ilan University, Ramat-Gan, Israel, in 2000 and 2006 respectively. In 2006-2007 he has been a postdoctoral fellow in the advanced LVCSR group in IBM T. J. Watson Research Center, Yorktown Heights, NY. He currently is working at IBM Haifa Research Lab, leading the multi-modal biometrics research team. His research interests include speaker identification, speaker diarization, face identification, audiovisual processing, and spoken language identification. Dr. Aronowitz is an author of 50 peer reviewed publications in major conferences and journals.

Lecture languages

EnglishHebrew

Topics

AI / Automation

Duration options

1 hour

Travel/delivery options

In-countryOutside of country: Open for discussionRemote via video conference

Country

Israel

Lecture booking request

Thank you for your interest in hosting an IBM speaker. Please fill out the following form with as much detail as possible. An IBM representative will reach out to discuss your booking request. All guest lectures are subject to availability and agreements under this collaboration are not legally binding.