英文文献翻译.docx

资源描述

英文文献翻译.docx

《英文文献翻译.docx》由会员分享，可在线阅读，更多相关《英文文献翻译.docx（17页珍藏版）》请在冰豆网上搜索。

英文文献翻译.docx

英文文献翻译

英文原文

Speechsynthesis

Speechsynthesisistheartificialproductionofhumanspeech.Acomputersystemusedforthispurposeiscalledaspeechsynthesizer,andcanbeimplementedinsoftwareorhardware.Atext-to-speech（TTS）systemconvertsnormallanguagetextintospeech;othersystemsrendersymboliclinguisticrepresentationslikephonetictranscriptionsintospeech.Synthesizedspeechcanbecreatedbyconcatenatingpiecesofrecordedspeechthatarestoredinadatabase.Systemsdifferinthesizeofthestoredspeechunits;asystemthatstoresphonesordiphonesprovidesthelargestoutputrange,butmaylackclarity.Forspecificusagedomains,thestorageofentirewordsorsentencesallowsforhigh-qualityoutput.Alternatively,asynthesizercanincorporateamodelofthevocaltractandotherhumanvoicecharacteristicstocreateacompletely"synthetic"voiceoutput.Thequalityofaspeechsynthesizerisjudgedbyitssimilaritytothehumanvoiceandbyitsabilitytobeunderstood.Anintelligibletext-to-speechprogramallowspeoplewithvisualimpairmentsorreadingdisabilitiestolistentowrittenworksonahomecomputer.Manycomputeroperatingsystemshaveincludedspeechsynthesizerssincetheearly1990s.

Overviewoftextprocessing

Atext-to-speechsystem（or"engine"）iscomposedoftwoparts:

afront-endandaback-end.Thefront-endhastwomajortasks.First,itconvertsrawtextcontainingsymbolslikenumbersandabbreviationsintotheequivalentofwritten-outwords.Thisprocessisoftencalledtextnormalization,pre-processing,ortokenization.Thefront-endthenassignsphonetictranscriptionstoeachword,anddividesandmarksthetextintoprosodicunits,likephrases,clauses,andsentences.Theprocessofassigningphonetictranscriptionstowordsiscalledtext-to-phonemeorgrapheme-to-phonemeconversion.Phonetictranscriptionsandprosodyinformationtogethermakeupthesymboliclinguisticrepresentationthatisoutputbythefront-end.Theback-end—oftenreferredtoasthesynthesizer—thenconvertsthesymboliclinguisticrepresentationintosound.Incertainsystems,thispartincludesthecomputationofthetargetprosody（pitchcontour,phonemedurations）,whichisthenimposedontheoutputspeech

History

Longbeforeelectronicsignalprocessingwasinvented,therewerethosewhotriedtobuildmachinestocreatehumanspeech.Someearlylegendsoftheexistenceof"speakingheads"involvedGerbertofAurillac（d.1003AD）,AlbertusMagnus（1198–1280）,andRogerBacon（1214–1294）.

In1779,theDanishscientistChristianKratzenstein,workingattheRussianAcademyofSciences,builtmodelsofthehumanvocaltractthatcouldproducethefivelongvowelsounds（inInternationalPhoneticAlphabetnotation,theyare[aː],[eː],[iː],[oː]and[uː]）.[5]Thiswasfollowedbythebellows-operated"acoustic-mechanicalspeechmachine"byWolfgangvonKempelenofPressburg,Hungary,describedina1791paper.[6]Thismachineaddedmodelsofthetongueandlips,enablingittoproduceconsonantsaswellasvowels.In1837,CharlesWheatstoneproduceda"speakingmachine"basedonvonKempelen'sdesign,andin1857,M.Faberbuiltthe"Euphonia".Wheatstone'sdesignwasresurrectedin1923byPaget.

Inthe1930s,BellLabsdevelopedthevocoder,whichautomaticallyanalyzedspeechintoitsfundamentaltoneandresonances.Fromhisworkonthevocoder,HomerDudleydevelopedamanuallykeyboard-operatedvoicesynthesizercalledTheVoder（VoiceDemonstrator）,whichheexhibitedatthe1939NewYorkWorld'sFair.

ThePatternplaybackwasbuiltbyDr.FranklinS.CooperandhiscolleaguesatHaskinsLaboratoriesinthelate1940sandcompletedin1950.Therewereseveraldifferentversionsofthishardwaredevicebutonlyonecurrentlysurvives.Themachineconvertspicturesoftheacousticpatternsofspeechintheformofaspectrogrambackintosound.Usingthisdevice,AlvinLibermanandcolleagueswereabletodiscoveracousticcuesfortheperceptionofphoneticsegments（consonantsandvowels）.

Dominantsystemsinthe1980sand1990sweretheMITalksystem,basedlargelyontheworkofDennisKlattatMIT,andtheBellLabssystem;[8]thelatterwasoneofthefirstmultilinguallanguage-independentsystems,makingextensiveuseofnaturallanguageprocessingmethods.

Earlyelectronicspeechsynthesizerssoundedroboticandwereoftenbarelyintelligible.Thequalityofsynthesizedspeechhassteadilyimproved,butoutputfromcontemporaryspeechsynthesissystemsisstillclearlydistinguishablefromactualhumanspeech.

Asthecost-performanceratiocausesspeechsynthesizerstobecomecheaperandmoreaccessibletothepeople,morepeoplewillbenefitfromtheuseoftext-to-speechprograms.

Electronicdevices

Thefirstcomputer-basedspeechsynthesissystemswerecreatedinthelate1950s.ThefirstgeneralEnglishtext-to-speechsystemwasdevelopedbyNorikoUmedaetal.in1968attheElectrotechnicalLaboratory,Japan.[10]In1961,physicistJohnLarryKelly,JrandcolleagueLouisGerstman[11]usedanIBM704computertosynthesizespeech,aneventamongthemostprominentinthehistoryofBellLabs.Kelly'svoicerecordersynthesizer（vocoder）recreatedthesong"DaisyBell",withmusicalaccompanimentfromMaxMathews.Coincidentally,ArthurC.ClarkewasvisitinghisfriendandcolleagueJohnPierceattheBellLabsMurrayHillfacility.Clarkewassoimpressedbythedemonstrationthatheuseditintheclimacticsceneofhisscreenplayforhisnovel2001:

ASpaceOdyssey,ArthurC.ClarkeBiographyattheWaybackMachine（archivedDecember11,1997）wheretheHAL9000computersingsthesamesongasitisbeingputtosleepbyastronautDaveBowman."Where"HAL"FirstSpoke（BellLabsSpeechSynthesiswebsite）".BellLabs.http:

//www.bell-Retrieved2010-02-17.Despitethesuccessofpurelyelectronicspeechsynthesis,researchisstillbeingconductedintomechanicalspeechsynthesizers.AnthropomorphicTalkingRobotWaseda-TalkerSeriesHandheldelectronicsfeaturingspeechsynthesisbeganemerginginthe1970s.OneofthefirstwastheTelesensorySystemsInc.（TSI）Speech+portablecalculatorfortheblindin1976.TSISpeech+&otherspeakingcalculatorsGevaryahu,Jonathan,"TSIS14001ASpeechSynthesizerLSIIntegratedCircuitGuide"[deadlink]Otherdeviceswereproducedprimarilyforeducationalpurposes,suchasSpeak&Spell,producedbyTexasInstrumentsBreslow,etal.UnitedStatesPatent4326710:

"Talkingelectronicgame"April27,1982in1978.Fidelityreleasedaspeakingversionofitselectronicchesscomputerin1979.VoiceChessChallengerThefirstvideogametofeaturespeechsynthesiswasthe1980shoot'emuparcadegame,Stratovox,fromSunElectronics.Gaming'sMostImportantEvolutions,GamesRadarAnotherearlyexamplewasthearcadeversionofBezerk,releasedthatsameyear.Thefirstmulti-playerelectronicgameusingvoicesynthesiswasMiltonfromMiltonBradleyCompany,whichproducedthedevicein1980.

Synthesizertechnologies

Themostimportantqualitiesofaspeechsynthesissystemarenaturalnessandintelligibility.[citationneeded]Naturalnessdescribeshowcloselytheoutputsoundslikehumanspeech,whileintelligibilityistheeasewithwhichtheoutputisunderstood.Theidealspeechsynthesizerisbothnaturalandintelligible.Speechsynthesissystemsusuallytrytomaximizebothcharacteristics.

Thetwoprimarytechnologiesforgeneratingsyntheticspeechwaveformsareconcatenativesynthesisandformantsynthesis.Eachtechnologyhasstrengthsandweaknesses,andtheintendedusesofasynthesissystemwilltypicallydeterminewhichapproachisused.

Concatenativesynthesis

Concatenativesynthesisisbasedontheconcatenation（orstringingtogether）ofsegmentsofrecordedspeech.Generally,concatenativesynthesisproducesthemostnatural-soundingsynthesizedspeech.However,differencesbetweennaturalvariationsinspeechandthenatureoftheautomatedtechniquesforsegmentingthewaveformssometimesresultinaudibleglitchesintheoutput.Therearethreemainsub-typesofconcatenativesynthesis.

Unitselectionsynthesis

Unitselectionsynthesisuseslargedatabasesofrecordedspeech.Duringdatabasecreation,eachrecordedutteranceissegmentedintosomeorallofthefollowing:

individualphones,diphones,half-phones,syllables,morphemes,words,phrases,andsentences.Typically,thedivisionintosegmentsisdoneusingaspeciallymodifiedspeechrecognizersettoa"forcedalignment"modewithsomemanualcorrectionafterward,usingvisualrepresentationssuchasthewaveformandspectrogram.[12]Anindexoftheunitsinthespeechdatabaseisthencreatedbasedonthesegmentationandacousticparameterslikethefundamentalfrequency（pitch）,duration,positioninthesyllable,andneighboringphones.Atruntime,thedesiredtargetutteranceiscreatedbydeterminingthebestchainofcandidateunitsfromthedatabase（unitselection）.Thisprocessistypicallyachievedusingaspeciallyweighteddecisiontree.

Unitselectionprovidesthegreatestnaturalness,becauseitappliesonlyasmallamountofdigitalsignalprocessing（DSP）totherecordedspeech.DSPoftenmakesrecordedspeechsoundlessnatural,althoughsomesystemsuseasmallamountofsignalprocessingatthepointofconcatenationtosmooththewaveform.Theoutputfromthebestunit-selectionsystemsisoftenindistinguishablefromrealhumanvoices,especiallyincontextsforwhichtheTTSsystemhasbeentuned.However,maximumnaturalnesstypicallyrequireunit-selectionspeechdatabasestobeverylarge,insomesystemsrangingintothegigabytesofrecordeddata,representingdozensofhoursofspeech.[13]Also,unitselectionalgorithmshavebeenkn

展开阅读全文