Phonetic figures are sound.docx
《Phonetic figures are sound.docx》由会员分享,可在线阅读,更多相关《Phonetic figures are sound.docx(13页珍藏版)》请在冰豆网上搜索。
![Phonetic figures are sound.docx](https://file1.bdocx.com/fileroot1/2023-5/23/feb2d9cf-02c5-498c-8de9-6b79598e717c/feb2d9cf-02c5-498c-8de9-6b79598e717c1.gif)
Phoneticfiguresaresound
Phoneticfiguresaresound-relatedfiguresofspeech.
Theyencompassvariousstylisticmeans,namelythealliteration,assonance,cacophony,
paronomasia(pun)andonomatopoiea.
Alloftheseconcreterealisationsofphoneticfiguresrelatetosoundsastheyrepresentrepititionofsoundsandvowels
(alliterationandassonance),clashesofsounds(cacophony),"playuponthesoundsandmeaningsofwords"(pun)
andimitationofsounds(onomatopoeiaPrevious/Next/Index
3.PhoneticsandTheoryofSpeechProduction
Speechprocessingandlanguagetechnologycontainslotsofspecialconceptsandterminology.Tounderstandhowdifferentspeechsynthesisandanalysismethodsworkwemusthavesomeknowledgeofspeechproduction,articulatoryphonetics,andsomeotherrelatedterminology.Thebasictheoryofthesetopicswillbediscussedbrieflyinthischapter.Formoredetailedinformation,seeforexampleFant(1970),Flanagan(1972),Witten(1982),O'Saughnessy(1987),orKleijnetal(1998).
3.1RepresentationandAnalysisofSpeechSignals
Continuousspeechisasetofcomplicatedaudiosignalswhichmakesproducingthemartificiallydifficult.Speechsignalsareusuallyconsideredasvoicedorunvoiced,butinsomecasestheyaresomethingbetweenthesetwo.Voicedsoundsconsistoffundamentalfrequency(F0)anditsharmoniccomponentsproducedbyvocalcords(vocalfolds).Thevocaltractmodifiesthisexcitationsignalcausingformant(pole)andsometimesantiformant(zero)frequencies(Witten1982).Eachformantfrequencyhasalsoanamplitudeandbandwidthanditmaybesometimesdifficulttodefinesomeoftheseparameterscorrectly.Thefundamentalfrequencyandformantfrequenciesareprobablythemostimportantconceptsinspeechsynthesisandalsoinspeechprocessingingeneral.
Withpurelyunvoicedsounds,thereisnofundamentalfrequencyinexcitationsignalandthereforenoharmonicstructureeitherandtheexcitationcanbeconsideredaswhitenoise.Theairflowisforcedthroughavocaltractconstrictionwhichcanoccurinseveralplacesbetweenglottisandmouth.Somesoundsareproducedwithcompletestoppageofairflowfollowedbyasuddenrelease,producinganimpulsiveturbulentexcitationoftenfollowedbyamoreprotractedturbulentexcitation(Kleijnetal.1998).Unvoicedsoundsarealsousuallymoresilentandlesssteadythanvoicedones.ThedifferencesbetweentheseareeasytoseefromFigure3.2wherethesecondandlastsoundsarevoicedandtheothersunvoiced.Whisperingisthespecialcaseofspeech.Whenwhisperingavoicedsoundthereisnofundamentalfrequencyintheexcitationandthefirstformantfrequenciesproducedbyvocaltractareperceived.
Speechsignalsofthethreevowels(/a//i//u/)arepresentedintime-andfrequencydomaininFigure3.1.Thefundamentalfrequencyisabout100HzinallcasesandtheformantfrequenciesF1,F2,andF3withvowel/a/areapproximately600Hz,1000Hz,and2500Hzrespectively.Withvowel/i/thefirstthreeformantsare200Hz,2300Hz,and3000Hz,andwith/u/300Hz,600Hz,and2300Hz.Theharmonicstructureoftheexcitationisalsoeasytoperceivefromfrequencydomainpresentation.
Fig.3.1.Thetime-andfrequency-domainpresentationofvowels/a/,/i/,and/u/.
Itcanbeseenthatthefirstthreeformantsareinsidethenormaltelephonechannel(from300Hzto3400Hz)sotheneededbandwidthforintelligiblespeechisnotverywide.Forhigherquality,upto10kHzbandwidthmaybeusedwhichleadsto20kHzsamplingfrequency.Unless,thefundamentalfrequencyisoutsidethetelephonechannel,thehumanhearingsystemiscapabletoreconstructitfromitsharmoniccomponents.
Anothercommonlyusedmethodtodescribeaspeechsignalisthespectrogramwhichisatime-frequency-amplitudepresentationofasignal.Thespectrogramandthetime-domainwaveformofFinnishwordkaksi(two)arepresentedinFigure3.2.Higheramplitudesarepresentedwithdarkergray-levelssotheformantfrequenciesandtrajectoriesareeasytoperceive.Alsospectraldifferencesbetweenvowelsandconsonantsareeasytocomprehend.Therefore,spectrogramisperhapsthemostusefulpresentationforspeechresearch.FromFigure3.2itiseasytoseethatvowelshavemoreenergyanditisfocusedatlowerfrequencies.Unvoicedconsonantshaveconsiderablylessenergyanditisusuallyfocusedathigherfrequencies.Withvoicedconsonantsthesituationissomethingbetweenofthesetwo.InFigure3.2thefrequencyaxisisinkilohertz,butitisalsoquitecommontouseanauditoryspectrogramwherethefrequencyaxisisreplacedwithBark-orMel-scalewhichisnormalizedforhearingproperties.
Fig.3.2.Spectrogramandtime-domainpresentationofFinnishwordkaksi(two).
Fordeterminingthefundamentalfrequencyorpitchofspeech,forexampleamethodcalledcepstralanalysismaybeused(Cawley1996,Kleijnetal.1998).CepstrumisobtainedbyfirstwindowingandmakingDiscreteFourierTransform(DFT)forthesignalandthenlogaritmizingpowerspectrumandfinallytransformingitbacktothetime-domainbyInverseDiscreteFourierTransform(IDFT).TheprocedureisshowninFigure3.3.
Fig.3.3.Cepstralanalysis.
Cepstralanalysisprovidesamethodforseparatingthevocaltractinformationfromexcitation.Thusthereversetransformationcanbecarriedouttoprovidesmootherpowerspectrumknownashomomorphicfiltering.
Fundamentalfrequencyorintonationcontouroverthesentenceisimportantforcorrectprosodyandnaturalsoundingspeech.Thedifferentcontoursareusuallyanalyzedfromnaturalspeechinspecificsituationsandwithspecificspeakercharacteristicsandthenappliedtorulestogeneratethesyntheticspeech.ThefundamentalfrequencycontourcanbeviewedasthecompositesetofhierarchicalpatternsshowninFigure3.4.Theoverallcontourisgeneratedbythesuperpositionofthesepatterns(Sagisaga1990).MethodsforcontrollingthefundamentalfrequencycontoursaredescribedlaterinChapter5.
Fig.3.4.Hierarchicallevelsoffundamentalfrequency(Sagisaga1990).
3.2SpeechProduction
HumanspeechisproducedbyvocalorganspresentedinFigure3.5.Themainenergysourceisthelungswiththediaphragm.Whenspeaking,theairflowisforcedthroughtheglottisbetweenthevocalcordsandthelarynxtothethreemaincavitiesofthevocaltract,thepharynxandtheoralandnasalcavities.Fromtheoralandnasalcavitiestheairflowexitsthroughthenoseandmouth,respectively.TheV-shapedopeningbetweenthevocalcords,calledtheglottis,isthemostimportantsoundsourceinthevocalsystem.Thevocalcordsmayactinseveraldifferentwaysduringspeech.Themostimportantfunctionistomodulatetheairflowbyrapidlyopeningandclosing,causingbuzzingsoundfromwhichvowelsandvoicedconsonantsareproduced.Thefundamentalfrequencyofvibrationdependsonthemassandtensionandisabout110Hz,200Hz,and300Hzwithmen,women,andchildren,respectively.Withstopconsonantsthevocalcordsmayactsuddenlyfromacompletelyclosedpositioninwhichtheycuttheairflowcompletely,tototallyopenpositionproducingalightcoughoraglottalstop.Ontheotherhand,withunvoicedconsonants,suchas/s/or/f/,theymaybecompletelyopen.Anintermediatepositionmayalsooccurwithforexamplephonemeslike/h/.
Fig.3.5.Thehumanvocalorgans.
(1)Nasalcavity,
(2)Hardpalate,(3)Alveoralridge,(4)Softpalate(Velum),(5)Tipofthetongue(Apex),(6)Dorsum,(7)Uvula,(8)Radix,(9)Pharynx,(10)Epiglottis,(11)Falsevocalcords,(12)Vocalcords,(13)Larynx,(14)Esophagus,and(15)Trachea.
Thepharynxconnectsthelarynxtotheoralcavity.Ithasalmostfixeddimensions,butitslengthmaybechangedslightlybyraisingorloweringthelarynxatoneendandthesoftpalateattheotherend.Thesoftpalatealsoisolatesorconnectstheroutefromthenasalcavitytothepharynx.Atthebottomofthepharynxaretheepiglottisandfalsevocalcordstopreventfoodreachingthelarynxandtoisolatetheesophagusacousticallyfromthevocaltract.Theepiglottis,thefalsevocalcordsandthevocalcordsareclosedduringswallowingandopenduringnormalbreathing.
Theoralcavityisoneofthemostimportantpartsofthevocaltract.Itssize,shapeandacousticscanbevariedbythemovementsofthepalate,thetongue,thelips,thecheeksandtheteeth.Especiallythetongueisveryflexible,thetipandtheedgescanbemovedindependentlyandtheentiretonguecanmoveforward,backward,upanddown.Thelipscontrolthesizeandshapeofthemouthopeningthroughwhichspeechsoundisradiated.Unliketheoralcavity,thenasalcavityhasfixeddimensionsandshape.Itslengthisabout12cmandvolume60cm3.Theairstreamtothenasalcavityiscontrolledbythesoftpalate.
Fromtechnicalpointofview,thevocalsystemmaybeconsideredasasingleacoustictubebetweentheglottisandmouth.GlottalexcitedvocaltractmaybethenapproximatedasastraightpipeclosedatthevocalcordswheretheacousticalimpedanceZg=∞andopenatthemouth(Zm=0).Inthiscasethevolume-velocitytransferfunctionofvocaltractis(Flanagan1972,O'Saughnessy1987)
(3.1)
wherelisthelengthofthetube,ωisradianfrequencyandcissoundvelocity.ThedenominatoriszeroatfrequenciesFi=ωi/2π(i=1,2,3,...),where
and
(3.2)
Ifl=17cm,V(ω)isinfiniteatfrequenciesFi=500,1500,2500,...Hzwhichmeansresonancesevery1kHzstartingat500Hz.Ifthelengthlisotherthan17cm,thefrequenciesFiwillbescaledbyfactor17/lsothevocaltractmaybeapproximatedwithtwoorthreesectionsoftubewheretheareasofadjacentsectionsarequitedifferen