声学特征 ivector_ivector特征_xmdxcsj的博客

link之家
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
1.UBM

universal background model[1]
使用GMM建模，UBM的训练通过EM算法完成，有两种方法：
所有的数据训练出来一个UBM，需要保证训练数据的均衡
训练多个UBM，然后合在一起，比如根据性别分成两个，这样的话可以更有效的利用非均衡数据以及控制最后的UBM。
2.supervector

使用MAP adaptation对UBM的高斯进行线性插值，获得speaker相关的GMM模型，该模型的均值作为supervector[2]。详细的训练过程参考[1].
假设UBM有C个分量，特征维度为F，那么最后得到的supervector的维度为C*F
3.ivector

identity vector
1.UBM

universal background model 使用gmm来刻画
UBM训练流程，最后得到 final.dubm ：
steps/online/nnet2/train_diag_ubm.sh
#gmm-global-init-from-feats 根据所有特征训练gmm
#gmm-gselect gmm-global-acc-stats 获取gmm训练的统计量
#gmm-global-est 根据统计量重新训练gmm
#gmm-global-copy 转化final.dubm为文本形式
假设特征40维，高斯个数为512 
2.extractor
 
ivector模型用来提取100维ivector特征，和mfcc特征合在一起当做dnn的输入，最后生成的模型是final.ie，训练流程如下 
steps/online/nnet2/train_ivector_extractor.sh
#ivector-extractor-init 使用final.dubm初始化最开始的ivector
#gmm-global-get-post 根据final.dubm获取cmvn后的特征的后验概率
#ivector-extractor-sum-accs 获取统计量
#ivector-extractor-est 根据统计量获得最后ivector模型final.ie
ivector-extractor-init --binary=false --ivector-dim=100 --use-weights=false "gmm-global-to-fgmm final.dubm -|" txt #查看文本形式的ie
3.提取ivector
 
ivector可以每一句一个，online的形式可以设成10帧一个，需要的文件包括： 
--cmvn-config=run/run_chain_1000h_pitch/exp/ivectors/train_max2/conf/online_cmvn.conf
--ivector-period=10
--splice-config=run/run_chain_1000h_pitch/exp/ivectors/train_max2/conf/splice.conf
--lda-matrix=run/run_chain_1000h_pitch/exp/extractor/final.mat
--global-cmvn-stats=run/run_chain_1000h_pitch/exp/extractor/global_cmvn.stats
--diag-ubm=run/run_chain_1000h_pitch/exp/extractor/final.dubm
--ivector-extractor=run/run_chain_1000h_pitch/exp/extractor/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=0
ivector提取流程如下： 
steps/online/nnet2/extract_ivectors_online.sh
#1.特征处理：cmvn+splice+lda
#2.根据特征和m(final.dubm)获得每个speaker对应的s
#3.根据s、m(final.dubm)、T(final.ie)得到w
#查看ivector特征
copy-feats --binary=false --compress=false ark:ivector_online.1.ark ark,t:ivector_online.1.ark.txt
训练和解码的文件需要保持一致，不然结果会差距比较大。 
[1].Speaker Verification Using Adapted Gaussian Mixture Models
 [2].Support Vector Machines using GMM Supervectors for Speaker Verification
 [3].Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit 
后面的技术分享转移到微信公众号上面更新了，【欢迎扫码关注交流】
 
                                        近年来，说话人识别作为人机交互领域的一个主要研究方向之一，己经在现实
生活中得到了广泛的应用。然而在实际应用中，能够提取到的说话人语音较短，导
致系统识别效果较差。因此，本文研究的主要内容为短语音说话人识别系统。
A Matlab Toolbox for Speaker Recognition Research
     Version 1.0
Seyed Omid Sadjadi, Malcolm Slaney, and Larry Heck
Microsoft Research, Conversational Systems Research Center (CSRC)
s.omid.sadjadi@gmail.com, {mslaney,larry.heck}@microsoft.com
This report serves as a user manual for the tools available in the Microsoft Research (MSR) Identity Toolbox. This toolbox contains a collection of Matlab tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the “barrier to entry”, enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect and accent identification.
In recent years, the design of robust and effective speaker recognition algorithms has attracted significant research effort from academic and commercial institutions. Speaker recognition has evolved substantially over the past 40 years; from discrete vector quantization (VQ) based systems to adapted Gaussian mixture model (GMM) solutions, and more recently to factor analysis based Eigenvoice (i-vector) frameworks. The Identity Toolbox provides tools that implement both the conventional GMM-UBM and state-of-the-art i-vector based speaker recognition strategies.
A speaker recognition system includes two primary components: a front-end and a back-end. The front-end transforms acoustic waveforms into more compact and less redundant representations called acoustic features. Cepstral features are most often used for speaker recognition. It is practical to only retain the high signal-to-noise ratio (SNR) regions of the waveform, therefore there is also a need for a speech activity detector (SAD) in the fr
                                    Alize完整的ivector例程，包括：数据准备、特征提取、训练以及测试等。最终生成的得分文件res/scores_PLDA_lengthnorm.txt，其含义参考GMM-UBM。
M S0002 1 BAC009S0002W0122 0.644295
M S0003 1 BAC009S0002W0122 0.520998
M S0004 1 BAC009S0002W0122 0.4846...
                                    An Investigation of Non-linear i-vectors for speaker verification
文章地址https://www.isca-speech.org/archive/Interspeech_2018/pdfs/2474.pdf
陈南新，Jesu的Villalba，Najim Dehak
语言和语音处理中心约翰霍普金斯大学，马里兰州巴尔的摩
{b...
                                    在实际应用中，由于说话人语音中说话人信息和各种干扰信息掺杂在一起，不同的采集设备的信道之间也具有差异性，会使我们收集到的语音中掺杂信道干扰信息。这种干扰信息会引起说话人信息的扰动。传统的GMM-UBM方法，没有办法克服这一问题，导致系统性能不稳定。
在GMM-UBM模型里，每个目标说话人都可以用GMM模型来描述。因为从UBM模型自适应到每个说话人的GMM模型时，只改变均值，对于权重和协方差不做任何调整，所以说话人的信息大部分都蕴含在GMM的均值里面。GMM均值矢量中，除了绝大部分的说话人信息之外，也包含了信
                                    “Useful Derivations for i-Vector Based Approach to Data Clustering in Speech Recognition” Yu Zhang
这篇文章较为详细地推到了i-Vecoter的由来，解答了许多困惑，salute!
假设Yi=(y1i,y2i,…,yTii)\boldsymbol{Y}^{i}=\left(\boldsymbol{y}_{1}^{i}, \boldsymbol{y}_{2}^{i}, \ldots, \boldsymbol
 void AgglomerativeClusterer::Cluster() {
   KALDI_VLOG(2) &lt;&lt; "Initializing cluster as...
                                    声纹识别声纹识别，生物识别技术的一种，也称为说话人识别，有两类，即说话人辨认和说话人确认。声纹识别的理论基础是每一个声音都具有独特的特征，通过该特征能将不同人的声音进行有效的区分。声音特征1、语音的特殊性。发音器官分为声门上系统、喉系统、声门下系统，每个人都有自己的一套发音器官，它们的形态、构造各有差别，每次发音需要众多发音器官相互配合、共同运动。这决定了语音的物理属性(也称语音四要素)：音质、音...
二、d-vecto
DNN 会输入一个固定长度的语音，对它做 Speaker Recognition。然后我们把这个模型的最后一层隐层抽取出来，它就是这段语音的 d-vector。不用 output layer 中的最后一层输出，因为它的维度是和训练时语者数目有关的。而是它前面的那一层隐层输出。
在实际预测的时候，输入语音是不等长的，会把语音截成多段，然后取这几段特征的d-vector的平均值作为最后的speaker embedding
本教程分为三个部分：
一是如何在ubuntu安装配置kaldi。
二是如何用kaldi的例子mobvoihotwords训练出模型，然后用训练出来的模型测试一条指定的语音是不是唤醒词“嗨小问”。
三是用其他的已经训练好的模型来测试指定的语音是不是唤醒词。需要修改代码。
四是可能遇到的问题
一．是如何在ubuntu安装配置kaldi？
1.安装git。如果使用gi
                                    在深度学习的路上，从头开始了解一下各项技术。本人是DL小白，连续记录我自己看的一些东西，大家可以互相交流。
本文参考：https://blog.csdn.net/u014688145/article/details/53046765?locationNum=7&amp;fps=1
https://blog.csdn.net/qq_27292549/article/details/7912896...