-
Deep Neural Network Embedding For Text-Independent Speaker VerificationAI 모델 2020. 12. 28. 17:36
This text is study for danielpovey.com/files/2017_interspeech_embeddings.pdf

Overview
- Replace i-vectors to embedding extracted from feed-forward deep neural network for text-independent speaker verification
- Use temporal pooling layer to aggregate variable length of speech segments
DNN embedding system
- Features
- 20-dimensional MFCC with 25ms frame-length
- Energy-based VAD is applied
- Model: Time-delay neural network (TDNN)
- The first 5 layers work at the frame level
- The statistic pooling layer aggregates over the input segment and compute its mean and standard deviation
- Two fully connected layer control dimension to 512 and 300 (number of speaker)
- Total parameter: 4.4M
- Use embedding a and b
- PLDA backend
- Embedding is reduced using LDA
- Embedding length normalization is applied
- PLDA scores are normalized using adative s-norm
Experiments
- Training data
- SWBD
- 65,000 recordings
- 6,500 speakers
- SWBD
- Evaluation
- Asses performance using NIST 2010 (SRE10), 2016 (SRE16)
- SRE10
- Consist of English telephone speech
- Enrollment utterances are full-length
- Test utterances have been truncated to the first T seconds of speech
- SRE16
- Consist of Tagalog and Cantonese language telephone speech
- Entollment utterances contain about 60seconds of speech
- Test utterances range from 10 to 60 seconds
- SRE10
- Results
- Asses performance using NIST 2010 (SRE10), 2016 (SRE16)

- For short length of test set, proposed model shows better results
- Fusion system shows the best results
Conclusions
- Propose DNN-based frame-level feature extraction for text-dependent speaker verification
- Overall, the embeddings seem to be competitive with traditional i-vector
- For short utterance, DNN-based feature shows better performance
'AI 모델' 카테고리의 다른 글
Anomaly Detection-Based Unknown Face Presentation Attack Detection (0) 2021.01.04 One-Class Convolutional Neural Network (OC-CNN) (0) 2021.01.04 Angular Prototypical Loss (0) 2020.12.28 Prototypical Networks for Few-shot Learning (0) 2020.12.24 CosFace: Large Margin Cosine Loss for Deep Face Recognition (0) 2020.12.23