[PDF]Advancesinnaturallanguageprocessing.pdf

上传人:zf 文档编号:30839444 上传时间:2024-01-30 格式:PDF 页数:7 大小:736.21KB
下载 相关 举报
[PDF]Advancesinnaturallanguageprocessing.pdf_第1页
第1页 / 共7页
[PDF]Advancesinnaturallanguageprocessing.pdf_第2页
第2页 / 共7页
[PDF]Advancesinnaturallanguageprocessing.pdf_第3页
第3页 / 共7页
[PDF]Advancesinnaturallanguageprocessing.pdf_第4页
第4页 / 共7页
[PDF]Advancesinnaturallanguageprocessing.pdf_第5页
第5页 / 共7页
点击查看更多>>
下载资源
资源描述

[PDF]Advancesinnaturallanguageprocessing.pdf

《[PDF]Advancesinnaturallanguageprocessing.pdf》由会员分享,可在线阅读,更多相关《[PDF]Advancesinnaturallanguageprocessing.pdf(7页珍藏版)》请在冰豆网上搜索。

[PDF]Advancesinnaturallanguageprocessing.pdf

REVIEWAdvancesinnaturallanguageprocessingJuliaHirschberg1*andChristopherD.Manning2,3Naturallanguageprocessingemployscomputationaltechniquesforthepurposeoflearning,understanding,andproducinghumanlanguagecontent.Earlycomputationalapproachestolanguageresearchfocusedonautomatingtheanalysisofthelinguisticstructureoflanguageanddevelopingbasictechnologiessuchasmachinetranslation,speechrecognition,andspeechsynthesis.Todaysresearchersrefineandmakeuseofsuchtoolsinreal-worldapplications,creatingspokendialoguesystemsandspeech-to-speechtranslationengines,miningsocialmediaforinformationabouthealthorfinance,andidentifyingsentimentandemotiontowardproductsandservices.Wedescribesuccessesandchallengesinthisrapidlyadvancingarea.Overthepast20years,computationallin-guisticshasgrownintobothanexcitingareaofscientificresearchandapracticaltechnologythatisincreasinglybeingin-corporatedintoconsumerproducts(forexample,inapplicationssuchasApplesSiriandSkypeTranslator).Fourkeyfactorsenabledthesedevelopments:

(i)avastincreaseincomputingpower,(ii)theavailabilityofverylargeamountsoflinguisticdata,(iii)thedevelopmentofhighlysuccessfulmachinelearning(ML)methods,and(iv)amuchricherunderstandingofthestructureofhumanlanguageanditsdeploymentinsocialcontexts.InthisReview,wedescribesomecur-rentapplicationareasofinterestinlanguageresearch.Theseeffortsillustratecomputationalapproachestobigdata,basedoncurrentcutting-edgemethodologiesthatcombinestatisticalanal-ysisandMLwithknowledgeoflanguage.Computationallinguistics,alsoknownasnat-urallanguageprocessing(NLP),isthesubfieldofcomputerscienceconcernedwithusingcom-putationaltechniquestolearn,understand,andproducehumanlanguagecontent.Computation-allinguisticsystemscanhavemultiplepurposes:

Thegoalcanbeaidinghuman-humancommu-nication,suchasinmachinetranslation(MT);aidinghuman-machinecommunication,suchaswithconversationalagents;orbenefitingbothhumansandmachinesbyanalyzingandlearn-ingfromtheenormousquantityofhumanlan-guagecontentthatisnowavailableonline.Duringthefirstseveraldecadesofworkincomputationallinguistics,scientistsattemptedtowritedownforcomputersthevocabulariesandrulesofhumanlanguages.Thisprovedadifficulttask,owingtothevariability,ambiguity,andcontext-dependentinterpretationofhumanlanguages.Forinstance,astarcanbeeitheranastronomicalobjectoraperson,and“star”canbeanounoraverb.Inanotherexample,twoin-terpretationsarepossiblefortheheadline“Teacherstrikesidlekids,”dependingonthenoun,verb,andadjectiveassignmentsofthewordsinthesentence,aswellasgrammaticalstructure.Beginninginthe1980s,butmorewidelyinthe1990s,NLPwastransformedbyresearchersstartingtobuildmod-elsoverlargequantitiesofempiricallanguagedata.Statisticalorcorpus(“bodyofwords”)basedNLPwasoneofthefirstnotablesuccessesoftheuseofbigdata,longbeforethepowerofMLwasmoregenerallyrecognizedortheterm“bigdata”evenintroduced.AcentralfindingofthisstatisticalapproachtoNLPhasbeenthatsimplemethodsusingwords,part-of-speech(POS)sequences(suchaswhetherawordisanoun,verb,orpreposition),orsimpletemplatescanoftenachievenotableresultswhentrainedonlargequantitiesofdata.Manytextandsentimentclassifiersarestillbasedsolelyonthedifferentsetsofwords(“bagofwords”)thatdocumentscontain,withoutregardtosentenceanddiscoursestructureormeaning.Achievingimprovementsoverthesesimplebaselinescanbequitedifficult.Nevertheless,thebest-performingsystemsnowusesophisticatedMLapproachesandarichunderstandingoflinguisticstructure.High-performancetoolsthatidentifysyntacticandsemanticinformationaswellasinformationaboutdiscoursecontextarenowavailable.OneexampleisStanfordCoreNLP

(1),whichprovidesastandardNLPpreprocessingpipelinethatin-cludesPOStagging(withtagssuchasnoun,verb,andpreposition);identificationofnamedentities,suchaspeople,places,andorganizations;parsingofsentencesintotheirgrammaticalstructures;andidentifyingco-referencesbetweennounphrasementions(Fig.1).Historically,twodevelopmentsenabledtheinitialtransformationofNLPintoabigdatafield.Thefirstwastheearlyavailabilitytoresearchersoflinguisticdataindigitalform,particularlythroughtheLinguisticDataConsortium(LDC)

(2),establishedin1992.Today,largeamountsofdigitaltextcaneasilybedownloadedfromtheWeb.Availableaslinguisticallyannotateddataarelargespeechandtextcorporaanno-tatedwithPOStags,syntacticparses,semanticlabels,annotationsofnamedentities(persons,places,organizations),dialogueacts(statement,question,request),emotionsandpositiveorneg-ativesentiment,anddiscoursestructure(topicorrhetoricalstructure).Second,performanceim-provementsinNLPwerespurredonbysharedtaskcompetitions.Originally,thesecompetitionswerelargelyfundedandorganizedbytheU.S.DepartmentofDefense,buttheywerelateror-ganizedbytheresearchcommunityitself,suchastheCoNLLSharedTasks(3).ThesetaskswereaprecursorofmodernMLpredictivemodelingandanalyticscompetitions,suchasonKaggle(4),inwhichcompaniesandresearchersposttheirdataandstatisticiansanddataminersfromallovertheworldcompetetoproducethebestmodels.AmajorlimitationofNLPtodayisthefactthatmostNLPresourcesandsystemsareavailableonlyforhigh-resourcelanguages(HRLs),suchasEnglish,French,Spanish,German,andChinese.Incontrast,manylow-resourcelanguages(LRLs)suchasBengali,Indonesian,Punjabi,Cebuano,andSwahilispokenandwrittenbymillionsofpeoplehavenosuchresourcesorsystemsavail-able.Afuturechallengeforthelanguagecommu-nityishowtodevelopresourcesandtoolsforhundredsorthousandsoflanguages,notjustafew.MachinetranslationProficiencyinlanguageswastraditionallyahall-markofalearnedperson.Althoughthesocialstandingofthishumanskillhasdeclinedinthemodernageofscienceandmachines,translationbetweenhumanlanguagesremainscruciallyim-portant,andMTisperhapsthemostsubstantialwayinwhichcomputerscouldaidhuman-humancommunication.Moreover,theabilityofcom-puterstotranslatebetweenhumanlanguagesremainsaconsummatetestofmachineintel-ligence:

Correcttranslationrequiresnotonlytheabilitytoanalyzeandgeneratesentencesinhumanlanguagesbutalsoahumanlikeunder-standingofworldknowledgeandcontext,de-spitetheambiguitiesoflanguages.Forexample,theFrenchword“bordel”straightforwardlymeans“brothel”;butifsomeonesays“Myroomisunbordel,”thenatranslatingmachinehastoknowenoughtosuspectthatthispersonisprobablynotrunningabrothelinhisorherroombutratherissaying“Myroomisacompletemess.”Machinetranslationwasoneofthefirstnon-numericapplicationsofcomputersandwasstudiedintensivelystartinginthelate1950s.However,thehand-builtgrammar-basedsystemsofearlydec-adesachievedverylimitedsuccess.Thefieldwastransformedintheearly1990swhenresearchersatIBMacquiredalargequantityofEnglishandFrenchsentencesthatweretranslationsofeachother(knownasparalleltext),producedastheproceedingsofthebilingualCanadianParliament.ThesedataallowedthemtocollectstatisticsofwordtranslationsandwordsequencesandtobuildaprobabilisticmodelofMT(5).Followingaquietperiodinthelate1990s,thenewmillenniumbroughtthepotentcombina-tionofampleonlinetext,includingconsiderablequantitiesofparalleltext,muchmoreabundantandinexpensivecomputing,andanewideaforbuildingstatisticalphrase-basedMTsystemsSCIENCEsciencemag.org17JULY2015VOL349ISSUE62452611DepartmentofComputerScience,ColumbiaUniversity,NewYork,NY10027,USA.2DepartmentofLinguistics,StanfordUniversity,Stanford,CA94305-2150,USA.3DepartmentofComputerScience,StanfordUniversity,Stanford,CA94305-9020,USA.*Correspondingauthor.E-mail:

juliacs.columbia.eduonJuly16,2015www.sciencemag.orgDownloadedfromonJuly16,2015www.sciencemag.orgDownloadedfromonJuly16,2015www.sciencemag.orgDownloadedfromonJuly16,2015www.sciencemag.orgDownloadedfromonJuly16,2015www.sciencemag.orgDownloadedfromonJuly16,2015www.sciencemag.orgDownloadedfrom(6).Ratherthantranslatingwordbyword,thekeyadvanceistonoticethatsmallwordgroupsoftenhavedistinctivetranslations.TheJapa-nese“mizuiro”isliterallythesequenceoftwowords(“watercolor”),butthisisnotthecorrectmeaning(nordoesitmeanatypeofpainting);rather,itindicatesalight,sky-bluecolor.Suchphrase-basedMTwasusedbyFranzOchinthedevelopmentofGoogleTranslate.Thistechnologyenabledtheserviceswehavetoday,whichallowfreeandinstanttranslationbetweenmanylanguagepairs,butitstillpro-ducestranslationsthatareonlyjustserviceablefordeterminingthegistofapassage.However,verypromisingworkcontinuestopushMTfor-ward.Muchsubsequentresearchhasaimedtobetterexploitthestructureofhumanlanguagesentences(i.e.,theirsyntax)intranslationsys-tems(7,8),andresearchersareactivelybuildingdeepermeaningrepresentationsoflanguage(9)toenableanewlevelofsemanticMT.Finally,justinthepastyear,wehaveseenthedevelopmentofanextremelypromisingapproachtoMTthroughtheuseofdeep-learningbasedsequencemodels.Thecentralideaofdeeplearn-ingisthatifwecantrainamodelwithseveralrepresentationallevelstooptimizeafinalobjec-tive,suchastranslationquality,thenthemodelcanitselflearnintermediaterepresentationsthatareusefulforthetaskathand.Thisideahasbeenexploredparticularlyforneuralnet-workmodelsinwhichinformationisstoredinreal-valuedvectors,withthemappingbetweenvectorsconsistingofamatrixmultiplicationfol-lowedbyanonlinearity,suchasasigmoidfunc-tionthatmapstheoutputvaluesofthematrixmultiplicationonto1,1.Buildinglargemodelsofthisformismuchmorepracticalwith

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 经管营销 > 销售营销

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1