Phonetic figures are sound.docx

资源描述

Phonetic figures are sound.docx

《Phonetic figures are sound.docx》由会员分享，可在线阅读，更多相关《Phonetic figures are sound.docx（13页珍藏版）》请在冰豆网上搜索。

Phonetic figures are sound.docx

Phoneticfiguresaresound

Phoneticfiguresaresound-relatedfiguresofspeech.

Theyencompassvariousstylisticmeans,namelythealliteration,assonance,cacophony,

paronomasia（pun）andonomatopoiea.

Alloftheseconcreterealisationsofphoneticfiguresrelatetosoundsastheyrepresentrepititionofsoundsandvowels

（alliterationandassonance）,clashesofsounds（cacophony）,"playuponthesoundsandmeaningsofwords"（pun）

andimitationofsounds（onomatopoeiaPrevious/Next/Index

3.PhoneticsandTheoryofSpeechProduction

Speechprocessingandlanguagetechnologycontainslotsofspecialconceptsandterminology.Tounderstandhowdifferentspeechsynthesisandanalysismethodsworkwemusthavesomeknowledgeofspeechproduction,articulatoryphonetics,andsomeotherrelatedterminology.Thebasictheoryofthesetopicswillbediscussedbrieflyinthischapter.Formoredetailedinformation,seeforexampleFant（1970）,Flanagan（1972）,Witten（1982）,O'Saughnessy（1987）,orKleijnetal（1998）.

3.1RepresentationandAnalysisofSpeechSignals

Continuousspeechisasetofcomplicatedaudiosignalswhichmakesproducingthemartificiallydifficult.Speechsignalsareusuallyconsideredasvoicedorunvoiced,butinsomecasestheyaresomethingbetweenthesetwo.Voicedsoundsconsistoffundamentalfrequency（F0）anditsharmoniccomponentsproducedbyvocalcords（vocalfolds）.Thevocaltractmodifiesthisexcitationsignalcausingformant（pole）andsometimesantiformant（zero）frequencies（Witten1982）.Eachformantfrequencyhasalsoanamplitudeandbandwidthanditmaybesometimesdifficulttodefinesomeoftheseparameterscorrectly.Thefundamentalfrequencyandformantfrequenciesareprobablythemostimportantconceptsinspeechsynthesisandalsoinspeechprocessingingeneral.

Withpurelyunvoicedsounds,thereisnofundamentalfrequencyinexcitationsignalandthereforenoharmonicstructureeitherandtheexcitationcanbeconsideredaswhitenoise.Theairflowisforcedthroughavocaltractconstrictionwhichcanoccurinseveralplacesbetweenglottisandmouth.Somesoundsareproducedwithcompletestoppageofairflowfollowedbyasuddenrelease,producinganimpulsiveturbulentexcitationoftenfollowedbyamoreprotractedturbulentexcitation（Kleijnetal.1998）.Unvoicedsoundsarealsousuallymoresilentandlesssteadythanvoicedones.ThedifferencesbetweentheseareeasytoseefromFigure3.2wherethesecondandlastsoundsarevoicedandtheothersunvoiced.Whisperingisthespecialcaseofspeech.Whenwhisperingavoicedsoundthereisnofundamentalfrequencyintheexcitationandthefirstformantfrequenciesproducedbyvocaltractareperceived.

Speechsignalsofthethreevowels（/a//i//u/）arepresentedintime-andfrequencydomaininFigure3.1.Thefundamentalfrequencyisabout100HzinallcasesandtheformantfrequenciesF1,F2,andF3withvowel/a/areapproximately600Hz,1000Hz,and2500Hzrespectively.Withvowel/i/thefirstthreeformantsare200Hz,2300Hz,and3000Hz,andwith/u/300Hz,600Hz,and2300Hz.Theharmonicstructureoftheexcitationisalsoeasytoperceivefromfrequencydomainpresentation.

Fig.3.1.Thetime-andfrequency-domainpresentationofvowels/a/,/i/,and/u/.

Itcanbeseenthatthefirstthreeformantsareinsidethenormaltelephonechannel（from300Hzto3400Hz）sotheneededbandwidthforintelligiblespeechisnotverywide.Forhigherquality,upto10kHzbandwidthmaybeusedwhichleadsto20kHzsamplingfrequency.Unless,thefundamentalfrequencyisoutsidethetelephonechannel,thehumanhearingsystemiscapabletoreconstructitfromitsharmoniccomponents.

Anothercommonlyusedmethodtodescribeaspeechsignalisthespectrogramwhichisatime-frequency-amplitudepresentationofasignal.Thespectrogramandthetime-domainwaveformofFinnishwordkaksi（two）arepresentedinFigure3.2.Higheramplitudesarepresentedwithdarkergray-levelssotheformantfrequenciesandtrajectoriesareeasytoperceive.Alsospectraldifferencesbetweenvowelsandconsonantsareeasytocomprehend.Therefore,spectrogramisperhapsthemostusefulpresentationforspeechresearch.FromFigure3.2itiseasytoseethatvowelshavemoreenergyanditisfocusedatlowerfrequencies.Unvoicedconsonantshaveconsiderablylessenergyanditisusuallyfocusedathigherfrequencies.Withvoicedconsonantsthesituationissomethingbetweenofthesetwo.InFigure3.2thefrequencyaxisisinkilohertz,butitisalsoquitecommontouseanauditoryspectrogramwherethefrequencyaxisisreplacedwithBark-orMel-scalewhichisnormalizedforhearingproperties.

Fig.3.2.Spectrogramandtime-domainpresentationofFinnishwordkaksi（two）.

Fordeterminingthefundamentalfrequencyorpitchofspeech,forexampleamethodcalledcepstralanalysismaybeused（Cawley1996,Kleijnetal.1998）.CepstrumisobtainedbyfirstwindowingandmakingDiscreteFourierTransform（DFT）forthesignalandthenlogaritmizingpowerspectrumandfinallytransformingitbacktothetime-domainbyInverseDiscreteFourierTransform（IDFT）.TheprocedureisshowninFigure3.3.

Fig.3.3.Cepstralanalysis.

Cepstralanalysisprovidesamethodforseparatingthevocaltractinformationfromexcitation.Thusthereversetransformationcanbecarriedouttoprovidesmootherpowerspectrumknownashomomorphicfiltering.

Fundamentalfrequencyorintonationcontouroverthesentenceisimportantforcorrectprosodyandnaturalsoundingspeech.Thedifferentcontoursareusuallyanalyzedfromnaturalspeechinspecificsituationsandwithspecificspeakercharacteristicsandthenappliedtorulestogeneratethesyntheticspeech.ThefundamentalfrequencycontourcanbeviewedasthecompositesetofhierarchicalpatternsshowninFigure3.4.Theoverallcontourisgeneratedbythesuperpositionofthesepatterns（Sagisaga1990）.MethodsforcontrollingthefundamentalfrequencycontoursaredescribedlaterinChapter5.

Fig.3.4.Hierarchicallevelsoffundamentalfrequency（Sagisaga1990）.

3.2SpeechProduction

HumanspeechisproducedbyvocalorganspresentedinFigure3.5.Themainenergysourceisthelungswiththediaphragm.Whenspeaking,theairflowisforcedthroughtheglottisbetweenthevocalcordsandthelarynxtothethreemaincavitiesofthevocaltract,thepharynxandtheoralandnasalcavities.Fromtheoralandnasalcavitiestheairflowexitsthroughthenoseandmouth,respectively.TheV-shapedopeningbetweenthevocalcords,calledtheglottis,isthemostimportantsoundsourceinthevocalsystem.Thevocalcordsmayactinseveraldifferentwaysduringspeech.Themostimportantfunctionistomodulatetheairflowbyrapidlyopeningandclosing,causingbuzzingsoundfromwhichvowelsandvoicedconsonantsareproduced.Thefundamentalfrequencyofvibrationdependsonthemassandtensionandisabout110Hz,200Hz,and300Hzwithmen,women,andchildren,respectively.Withstopconsonantsthevocalcordsmayactsuddenlyfromacompletelyclosedpositioninwhichtheycuttheairflowcompletely,tototallyopenpositionproducingalightcoughoraglottalstop.Ontheotherhand,withunvoicedconsonants,suchas/s/or/f/,theymaybecompletelyopen.Anintermediatepositionmayalsooccurwithforexamplephonemeslike/h/.

Fig.3.5.Thehumanvocalorgans.

（1）Nasalcavity,

（2）Hardpalate,（3）Alveoralridge,（4）Softpalate（Velum）,（5）Tipofthetongue（Apex）,（6）Dorsum,（7）Uvula,（8）Radix,（9）Pharynx,（10）Epiglottis,（11）Falsevocalcords,（12）Vocalcords,（13）Larynx,（14）Esophagus,and（15）Trachea.

Thepharynxconnectsthelarynxtotheoralcavity.Ithasalmostfixeddimensions,butitslengthmaybechangedslightlybyraisingorloweringthelarynxatoneendandthesoftpalateattheotherend.Thesoftpalatealsoisolatesorconnectstheroutefromthenasalcavitytothepharynx.Atthebottomofthepharynxaretheepiglottisandfalsevocalcordstopreventfoodreachingthelarynxandtoisolatetheesophagusacousticallyfromthevocaltract.Theepiglottis,thefalsevocalcordsandthevocalcordsareclosedduringswallowingandopenduringnormalbreathing.

Theoralcavityisoneofthemostimportantpartsofthevocaltract.Itssize,shapeandacousticscanbevariedbythemovementsofthepalate,thetongue,thelips,thecheeksandtheteeth.Especiallythetongueisveryflexible,thetipandtheedgescanbemovedindependentlyandtheentiretonguecanmoveforward,backward,upanddown.Thelipscontrolthesizeandshapeofthemouthopeningthroughwhichspeechsoundisradiated.Unliketheoralcavity,thenasalcavityhasfixeddimensionsandshape.Itslengthisabout12cmandvolume60cm3.Theairstreamtothenasalcavityiscontrolledbythesoftpalate.

Fromtechnicalpointofview,thevocalsystemmaybeconsideredasasingleacoustictubebetweentheglottisandmouth.GlottalexcitedvocaltractmaybethenapproximatedasastraightpipeclosedatthevocalcordswheretheacousticalimpedanceZg=∞andopenatthemouth（Zm=0）.Inthiscasethevolume-velocitytransferfunctionofvocaltractis（Flanagan1972,O'Saughnessy1987）

（3.1）

wherelisthelengthofthetube,ωisradianfrequencyandcissoundvelocity.ThedenominatoriszeroatfrequenciesFi=ωi/2π（i=1,2,3,...）,where

and

（3.2）

Ifl=17cm,V（ω）isinfiniteatfrequenciesFi=500,1500,2500,...Hzwhichmeansresonancesevery1kHzstartingat500Hz.Ifthelengthlisotherthan17cm,thefrequenciesFiwillbescaledbyfactor17/lsothevocaltractmaybeapproximatedwithtwoorthreesectionsoftubewheretheareasofadjacentsectionsarequitedifferen

展开阅读全文