Matlab在语音识别中的应用.docx

资源描述

Matlab在语音识别中的应用.docx

《Matlab在语音识别中的应用.docx》由会员分享，可在线阅读，更多相关《Matlab在语音识别中的应用.docx（51页珍藏版）》请在冰豆网上搜索。

Matlab在语音识别中的应用.docx

Matlab在语音识别中的应用

1.基于GUI的音频采集处理系统

注：

本实验是对“东、北、大、学、中、荷、学、院”孤立文字的识别！

首先是GUI的建立，拖动所需控件，双击控件，修改控件的参数；主要有stringTag（这个是回调函数的依据），其中还有些参数如valuestyle也是需要注意的，这个在实际操作中不能忽视。

这里需要给说明一下：

图中所示按钮都是在一个按钮组里面，都属于按钮组的子控件。

所以在添加回调函数时，是在按钮组里面添加的，也就是说右击三个按钮外面的边框，选择ViewCallback——SelectionChange,则在主函数中显示该按钮的回调函数：

functionuipanel1_SelectionChangeFcn（hObject,eventdata,handles）

以第一个按钮“录音”为例讲解代码；

下面是“播放”和“保存”的代码：

以上就是语音采集的全部代码。

程序运行后就会出现这样的界面：

点击录音按钮，录音结束后就会出现相应波形：

点击保存，完成声音的保存，保存格式为.wav。

这就完成了声音的采集。

声音的处理与识别

2.1打开文件

语音处理首先要先打开一个后缀为.wav的文件，这里用到的不是按钮组，而是独立的按钮，按钮“打开”的回调函数如下：

functionpushbutton1_Callback（hObject,eventdata,handles）

其中pushbutton1是“打开”按钮的Tag.

在回调函数下添加如下代码：

运行结果如图：

2.2预处理

回调函数如下：

functionpushbutton2_Callback（hObject,eventdata,handles）

运行结果如图：

2.3

短时能量

短时能量下的回调函数：

functionpushbutton3_Callback（hObject,eventdata,handles）

其回调函数下的代码是：

2.4

端点检测

这里要先声明一点，为了避免在以后的函数调用中，不能使用前面的变量，所以其实后面的函数都包含了前面的部分。

显而易见这样程序就会显得很冗长，这也是值得以后修改的地方。

functionpushbutton4_Callback（hObject,eventdata,handles）

2.5

生成模版

本功能和上面重复的部分省略掉了，现在只补充添加的代码：

2.6

语音识别

将打开的语音与提前录好的语音库进行识别，采用的是DTW算法。

识别完后就会在相应的文本框里显示识别的文字。

代码如下：

程序运行前后的对比图：

GUI的整体效果图：

总结

实验已经实现了对“东、北、大、学、中、荷、学、院”文字的识别，前提是用模版的语音作为样本去和语音库测试，这已经可以保证１００％的正确率，这说明算法是正确的，只是需要优化。

而现场录音和模版匹配时，则不能保证较高的正确率，这说明特征参数的提取这方面还不够完善。

特征参数提取的原则是类内距离尽量小，类间距离尽量大的原则，这是需要以后完善的地方。

ＧＵＩ也需要优化，先生成一个模版库，然后用待测语音和模版库语音识别，让这个模版库孤立出来，不需要每次测试都要重复生成模版库，提高运算速率。

以后有机会可以实现连续语音的识别！

附件

这是全部代码文件

mfcc.mat文件是程序运行过程中生成的；

test文件夹里面存放了录音的模版：

这里是6个.M文件，如下：

1WienerScalart96.m

functionoutput=WienerScalart96（signal,fs,IS）

%output=WIENERSCALART96（signal,fs,IS）

%WienerfilterbasedontrackingaprioriSNRusingDecision-Directed

%method,proposedbyScalartetal96.Inthismethoditisassumedthat

%SNRpost=SNRprior+1.basedonthistheWienerFiltercanbeadaptedtoa

%modellikeEphraimsmodelinwhichwehaveagainfunctionwhichisa

%functionofaprioriSNRandaprioriSNRisbeingtrackedusingDecision

%Directedmethod.

%Author:

EsfandiarZavarehei

%Created:

MAR-05

if（nargin<3|isstruct（IS））

IS=.25;%InitialSilenceorNoiseOnlypartinseconds

end

W=fix（.025*fs）;%Windowlengthis25ms

SP=.4;%Shiftpercentageis40%（10ms）%Overlap-Addmethodworksgoodwiththisvalue（.4）

wnd=hamming（W）;

%IGNOREFROMHERE...............................

if（nargin>=3&isstruct（IS））%Thisoptionisforcompatibilitywithanotherprogramme

W=IS.windowsize

SP=IS.shiftsize/W;

%nfft=IS.nfft;

wnd=IS.window;

ifisfield（IS,'IS'）

IS=IS.IS;

else

IS=.25;

end

%......................................UPTOHERE

pre_emph=0;

signal=filter（[1-pre_emph],1,signal）;

NIS=fix（（IS*fs-W）/（SP*W）+1）;%numberofinitialsilencesegments

y=segment（signal,W,SP,wnd）;%Thisfunctionchopsthesignalintoframes

Y=fft（y）;

YPhase=angle（Y（1:

fix（end/2）+1,:

））;%NoisySpeechPhase

Y=abs（Y（1:

fix（end/2）+1,:

））;%Specrogram

numberOfFrames=size（Y,2）;

FreqResol=size（Y,1）;

N=mean（Y（:

NIS）'）';%initialNoisePowerSpectrummean

LambdaD=mean（（Y（:

NIS）'）.^2）';%initialNoisePowerSpectrumvariance

alpha=.99;%usedinsmoothingxi（ForDeciesionDirectedmethodforestimationofAPrioriSNR）

NoiseCounter=0;

NoiseLength=9;%Thisisasmoothingfactorforthenoiseupdating

G=ones（size（N））;%InitialGainusedincalculationofthenewxi

Gamma=G;

X=zeros（size（Y））;%InitializeX（memoryallocation）

h=waitbar（0,'Wait...'）;

fori=1:

numberOfFrames

%%%%%%%%%%%%%%%%VADandNoiseEstimationSTART

ifi<=NIS%IfinitialsilenceignoreVAD

SpeechFlag=0;

NoiseCounter=100;

else%ElseDoVAD

[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad（Y（:

i）,N,NoiseCounter）;%MagnitudeSpectrumDistanceVAD

end

ifSpeechFlag==0%IfnotSpeechUpdateNoiseParameters

N=（NoiseLength*N+Y（:

i））/（NoiseLength+1）;%Updateandsmoothnoisemean

LambdaD=（NoiseLength*LambdaD+（Y（:

i）.^2））./（1+NoiseLength）;%Updateandsmoothnoisevariance

end

%%%%%%%%%%%%%%%%%%%VADandNoiseEstimationEND

gammaNew=（Y（:

i）.^2）./LambdaD;%ApostirioriSNR

xi=alpha*（G.^2）.*Gamma+（1-alpha）.*max（gammaNew-1,0）;%DecisionDirectedMethodforAPrioriSNR

Gamma=gammaNew;

G=（xi./（xi+1））;

X（:

i）=G.*Y（:

i）;%ObtainthenewCleanedvalue

waitbar（i/numberOfFrames,h,num2str（fix（100*i/numberOfFrames）））;

end

close（h）;

output=OverlapAdd2（X,YPhase,W,SP*W）;%Overlap-addSynthesisofspeech

output=filter（1,[1-pre_emph],output）;%UndotheeffectofPre-emphasis

functionReconstructedSignal=OverlapAdd2（XNEW,yphase,windowLen,ShiftLen）;

%Y=OverlapAdd（X,A,W,S）;

%Yisthesignalreconstructedsignalfromitsspectrogram.Xisamatrix

%witheachcolumnbeingthefftofasegmentofsignal.Aisthephase

%angleofthespectrumwhichshouldhavethesamedimensionasX.ifitis

%notgiventhephaseangleofXisusedwhichinthecaseofrealvaluesis

%zero（assumingthatitsthemagnitude）.Wisthewindowlengthoftime

%domainsegmentsifnotgiventhelengthisassumedtobetwiceaslongas

%fftwindowlength.Sistheshiftlengthofthesegmentationprocess（for

%exampleinthecaseofnonoverlappingsignalsitisequaltoWandinthe

%caseof%50overlapisequaltoW/2.ifnotgivvenW/2isused.Yisthe

%reconstructedtimedomainsignal.

%Sep-04

%EsfandiarZavarehei

ifnargin<2

yphase=angle（XNEW）;

end

ifnargin<3

windowLen=size（XNEW,1）*2;

end

ifnargin<4

ShiftLen=windowLen/2;

end

iffix（ShiftLen）~=ShiftLen

ShiftLen=fix（ShiftLen）;

disp（'Theshiftlengthhavetobeanintegerasitisthenumberofsamples.'）

disp（['shiftlengthisfixedto'num2str（ShiftLen）]）

end

[FreqResFrameNum]=size（XNEW）;

Spec=XNEW.*exp（j*yphase）;

ifmod（windowLen,2）%ifFreqResolisodd

Spec=[Spec;flipud（conj（Spec（2:

end,:

）））];

else

Spec=[Spec;flipud（conj（Spec（2:

end-1,:

）））];

end

sig=zeros（（FrameNum-1）*ShiftLen+windowLen,1）;

weight=sig;

fori=1:

FrameNum

start=（i-1）*ShiftLen+1;

spec=Spec（:

i）;

sig（start:

start+windowLen-1）=sig（start:

start+windowLen-1）+real（ifft（spec,windowLen））;

end

ReconstructedSignal=sig;

functionSeg=segment（signal,W,SP,Window）

%SEGMENTchopsasignaltooverlappingwindowedsegments

%A=SEGMENT（X,W,SP,WIN）returnsamatrixwhichitscolumnsaresegmented

%andwindowedframesoftheinputonedimentionalsignal,X.Wisthe

%numberofsamplesperwindow,defaultvalueW=256.SPistheshift

%percentage,defaultvalueSP=0.4.WINisthewindowthatismultipliedby

%eachsegmentanditslengthshouldbeW.thedefaultwindowishamming

%window.

%06-Sep-04

%EsfandiarZavarehei

ifnargin<3

SP=.4;

end

ifnargin<2

W=256;

end

ifnargin<4

Window=hamming（W）;

end

Window=Window（:

）;%makeitacolumnvector

L=length（signal）;

SP=fix（W.*SP）;

N=fix（（L-W）/SP+1）;%numberofsegments

Index=（repmat（1:

W,N,1）+repmat（（0:

（N-1））'*SP,1,W））';

hw=repmat（Window,1,N）;

Seg=signal（Index）.*hw;

function[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad（signal,noise,NoiseCounter,NoiseMargin,Hangover）

%[NOISEFLAG,SPEECHFLAG,NOISECOUNTER,DIST]=vad（SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER）

%SpectralDistanceVoiceActivityDetector

%SIGNAListhethecurrentframesmagnitudespectrumwhichistolabeldas

%noiseorspeech,NOISEisnoisemagnitudespectrumtemplate（estimation）,

%NOISECOUNTERisthenumberofimediatepreviousnoiseframes,NOISEMARGIN

%（default3）isthespectraldistancethreshold.HANGOVER（default8）is

%thenumberofnoisesegmentsafterwhichtheSPEECHFLAGisreset（goesto

%zero）.NOISEFLAGissettooneifthethesegmentislabeldasnoise

%NOISECOUNTERreturnsthenumberofpreviousnoisesegments,thisvalueis

%reset（tozero）wheneveraspeechsegmentisdetected.DISTisthe

%spectraldistance.

%SaeedVaseghi

%editedbyEsfandiarZavarehei

%Sep-04

ifnargin<4

NoiseMargin=3;

end

ifnargin<5

Hangover=8;

end

ifnargin<3

NoiseCounter=0;

end

FreqResol=length（signal）;

SpectralDist=20*（log10（signal）-log10（noise））;

SpectralDist（find（SpectralDist<0））=0;

Dist=mean（SpectralDist）;

if（Dist

NoiseFlag=1;

NoiseCounter=NoiseCounter+1;

else

NoiseFlag=0;

NoiseCounter=0;

end

%Detectnoiseonlyperiodsandattenuatethesignal

if（NoiseCounter>Hangover）

SpeechFlag=0;

else

SpeechFlag=1;

end

2mfcc.m

functioncc=mfcc（k）

%------------------------------

%cc=mfcc（k）计算语音k的MFCC系数

%------------------------------

%M为滤波器个数，N为一帧语音采样点数

M=24;N=256;

%归一化mel滤波器组系数

bank=melbankm（M,N,22050,0,0.5,'m'）;

figure;

plot（linspace（0,N/2,129）,bank）;

title（'Mel-SpacedFilterbank'）;

xlabel（'Frequency[Hz]'）;

bank=full（bank）;

bank=bank/max（bank（:

））;

%DCT系数,12*24

fori=1:

j=0:

23;

dctcoef（i,:

）=cos（（2*j+1）*i*pi/（2*24））;

end

%归一化倒谱提升窗口

w=1+6*sin（pi*[1:

12]./12）;

w=w/max（w）;

%预加重

AggrK=double（k）;

AggrK=filter（[1,-0.9375],1,AggrK）;

%分帧

FrameK=enframe（AggrK,N,80）;

%加窗

fori=1:

size（FrameK,1）

FrameK（i,:

）=（FrameK（i,:

））'.*hamming（N）;

end

FrameK=FrameK';

%计算功率谱

S=（abs（fft（FrameK）））.^2;

disp（'显示功率谱……'）

figure;

plot（S）;

axis（[1,size（S,1）,0,2]）;

title（'PowerSpectrum（M=24,N=256）'）;

xlabel（'Frame'）;

ylabel（'Frequency[Hz]'）;

colorbar;

%将功率谱通过滤波器组

P=bank*S（1:

129,:

）;

%取对数后作离散余弦变换

D=dctcoef*log（P）;

%倒谱提升窗

fori=1:

size（D,2）

m（i,:

）=（D（:

i）.*w'）';

end

%差分系数

dtm=zeros（size（m））;

fori=3:

size（m,1）-2

dtm（i,:

）=-2*m（i-2,:

）-m（i-1,:

）+m（i+1,:

）+2*m（i+2,:

）;

end

dtm=dtm/3;

%合并mfcc参数和一阶差分mfcc参数

cc=[m,dtm];

%去除首尾两帧，因为这两帧的一阶差分参数为0

cc=cc（3:

size（m,1）-2,:

）;

3getpoint.m

function[StartPoint,EndPoint]=getpoint（k,fs）

%UNTITLED此处显示有关此函数的摘要

%此处显示详细说明

signal=WienerScalart96（k,fs）;

sigLength=length（signal）;%计算信号长度

t=（0:

sigLength-1）/fs;%计算信号对应时间坐标

FrameLen=round（（0.012/max（t））*sigLength）;%定义每一帧长度

FrameInc=round（FrameLen/3）;%每一帧的重叠区域，选为帧长的1/3~1/2

tmp=enframe（signal（1:

end）,FrameLen,FrameInc）;

展开阅读全文