ImageVerifierCode 换一换
格式:PDF , 页数:7 ,大小:736.21KB ,
资源ID:30839204      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bdocx.com/down/30839204.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文([PDF]Advancesinnaturallanguageprocessing.pdf)为本站会员(zf)主动上传,冰豆网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰豆网(发送邮件至service@bdocx.com或直接QQ联系客服),我们立即给予删除!

[PDF]Advancesinnaturallanguageprocessing.pdf

1、REVIEWAdvances in naturallanguage processingJulia Hirschberg1*and Christopher D.Manning2,3Natural language processing employs computational techniques for the purpose of learning,understanding,and producing human language content.Early computational approaches tolanguage research focused on automati

2、ng the analysis of the linguistic structure of languageanddevelopingbasictechnologiessuchasmachinetranslation,speechrecognition,andspeechsynthesis.Todays researchers refine and make use of such tools in real-world applications,creating spoken dialogue systems and speech-to-speech translation engines

3、,mining socialmedia for information about health or finance,and identifying sentiment and emotion towardproducts and services.We describe successes and challenges in this rapidly advancing area.Over the past 20 years,computational lin-guistics has grown into both an excitingarea of scientific resear

4、ch and a practicaltechnology that is increasingly being in-corporated into consumer products(forexample,in applications such as Apples Siri andSkypeTranslator).Fourkeyfactorsenabledthesedevelopments:(i)a vast increase in computingpower,(ii)the availability of very large amountsof linguistic data,(ii

5、i)the development of highlysuccessful machine learning(ML)methods,and(iv)amuchricherunderstandingofthestructureof human language and its deployment in socialcontexts.In this Review,we describe some cur-rent application areas of interest in languageresearch.These efforts illustrate computationalappro

6、achestobigdata,basedoncurrentcutting-edge methodologies that combine statistical anal-ysis and ML with knowledge of language.Computationallinguistics,alsoknownasnat-ural language processing(NLP),is the subfieldof computer science concerned with using com-putational techniques tolearn,understand,andp

7、roducehumanlanguagecontent.Computation-al linguistic systems can have multiple purposes:The goal can be aiding human-human commu-nication,such as in machine translation(MT);aiding human-machine communication,such aswith conversational agents;or benefiting bothhumans and machines by analyzing and lea

8、rn-ing from the enormous quantity of human lan-guage content that is now available online.During the first several decades of work incomputational linguistics,scientists attemptedto write down for computers the vocabulariesand rules of human languages.This proved adifficult task,owing to the variabi

9、lity,ambiguity,and context-dependent interpretation of humanlanguages.For instance,a star can be either anastronomical object or a person,and“star”canbe a noun or a verb.In another example,two in-terpretationsarepossiblefortheheadline“Teacherstrikesidlekids,”dependingonthenoun,verb,andadjectiveassig

10、nmentsofthewordsinthesentence,aswellasgrammaticalstructure.Beginning in the1980s,but more widely in the 1990s,NLP wastransformedbyresearchersstartingtobuildmod-els over large quantities of empirical languagedata.Statisticalorcorpus(“bodyofwords”)basedNLP was one of the first notable successes ofthe

11、use of big data,long before the power ofML was more generally recognized or the term“big data”even introduced.A central finding of this statistical approach toNLP has been that simple methods using words,part-of-speech(POS)sequences(suchaswhethera wordis a noun,verb,orpreposition),or simpletemplates

12、 can often achieve notable results whentrained on large quantities of data.Many textand sentiment classifiers are still based solely onthe different sets of words(“bag of words”)thatdocuments contain,without regard to sentenceand discourse structure or meaning.Achievingimprovementsoverthesesimplebas

13、elinescanbequite difficult.Nevertheless,the best-performingsystems now use sophisticated ML approachesand a rich understanding of linguistic structure.High-performance tools that identify syntacticand semanticinformationaswellas informationabout discourse context are now available.Oneexampleis Stanf

14、ordCoreNLP(1),whichprovidesa standard NLP preprocessing pipeline that in-cludes POS tagging(with tags suchasnoun,verb,andpreposition);identification of named entities,such as people,places,and organizations;parsingof sentences into their grammatical structures;and identifying co-references between n

15、ounphrase mentions(Fig.1).Historically,two developments enabled theinitialtransformation ofNLP intoa bigdata field.The first was the early availability to researchersof linguistic data in digital form,particularlythrough the Linguistic Data Consortium(LDC)(2),established in 1992.Today,large amountso

16、f digital text can easily be downloaded fromthe Web.Available as linguistically annotateddata are large speech and text corpora anno-tated with POS tags,syntactic parses,semanticlabels,annotations of named entities(persons,places,organizations),dialogue acts(statement,question,request),emotions and

17、positive or neg-ative sentiment,and discourse structure(topicor rhetorical structure).Second,performance im-provements in NLP were spurred on by sharedtaskcompetitions.Originally,thesecompetitionswere largely funded and organized by the U.S.Department of Defense,but they were later or-ganized by the

18、 research community itself,suchas the CoNLL Shared Tasks(3).These tasks werea precursor of modern ML predictive modelingand analytics competitions,such as on Kaggle(4),in which companies and researchers post theirdataandstatisticiansanddataminersfromalloverthe world compete to produce the best model

19、s.AmajorlimitationofNLPtodayisthefactthatmost NLP resources and systems are availableonly forhigh-resource languages(HRLs),such asEnglish,French,Spanish,German,and Chinese.Incontrast,manylow-resourcelanguages(LRLs)such as Bengali,Indonesian,Punjabi,Cebuano,and Swahilispoken and written by millions o

20、fpeople have no such resources or systems avail-able.Afuturechallengeforthelanguagecommu-nity is how to develop resources and tools forhundredsorthousandsoflanguages,notjustafew.Machine translationProficiency in languages was traditionally a hall-mark of a learned person.Although the socialstanding

21、of this human skill has declined in themodernageofscienceandmachines,translationbetween human languages remains crucially im-portant,and MT is perhaps the most substantialwayinwhichcomputerscouldaidhuman-humancommunication.Moreover,the ability of com-puters to translate between human languagesremain

22、s a consummate test of machine intel-ligence:Correct translation requires not onlytheability toanalyze and generate sentences inhuman languages but also a humanlike under-standing of world knowledge and context,de-spite the ambiguities of languages.For example,theFrenchword“bordel”straightforwardlym

23、eans“brothel”;but if someone says“My room is unbordel,”thenatranslatingmachinehastoknowenoughtosuspectthatthispersonisprobablynotrunninga brothel inhisorherroom butratherissaying“My room is a complete mess.”Machine translation was one of the first non-numericapplicationsofcomputersandwasstudiedinten

24、sively starting in the late 1950s.However,thehand-built grammar-based systems of early dec-ades achieved very limited success.The field wastransformedin the early 1990s when researchersat IBM acquired a large quantity of English andFrench sentences that were translations of eachother(known as parall

25、el text),produced as theproceedingsofthebilingualCanadianParliament.These data allowed them to collect statistics ofword translations and word sequences and tobuild a probabilistic model of MT(5).Following a quiet period in the late 1990s,the new millennium brought the potent combina-tion of ample o

26、nline text,including considerablequantities of parallel text,much more abundantand inexpensive computing,and a new ideafor building statistical phrase-based MT systemsSCIENCEsciencemag.org17 JULY 2015 VOL 349 ISSUE 62452611Department of Computer Science,Columbia University,New York,NY 10027,USA.2Dep

27、artment of Linguistics,Stanford University,Stanford,CA 94305-2150,USA.3Department of ComputerScience,Stanford University,Stanford,CA 94305-9020,USA.*Corresponding author.E-mail:juliacs.columbia.edu on July 16,2015www.sciencemag.orgDownloaded from on July 16,2015www.sciencemag.orgDownloaded from on J

28、uly 16,2015www.sciencemag.orgDownloaded from on July 16,2015www.sciencemag.orgDownloaded from on July 16,2015www.sciencemag.orgDownloaded from on July 16,2015www.sciencemag.orgDownloaded from(6).Rather than translating word by word,thekey advance is to notice that small word groupsoften have distinc

29、tive translations.The Japa-nese“mizu iro”is literally the sequenceof two words(“water color”),but this is not thecorrect meaning(nor does it mean a type ofpainting);rather,it indicates a light,sky-blue color.Such phrase-based MT was used by Franz Och inthe development of Google Translate.This techno

30、logy enabled the services we havetoday,which allow free and instant translationbetween many language pairs,but it still pro-duces translations that are only just serviceablefor determining the gist of a passage.However,very promising work continues to push MT for-ward.Much subsequent research has ai

31、med tobetter exploit the structure of human languagesentences(i.e.,their syntax)in translation sys-tems(7,8),and researchers are actively buildingdeeper meaning representations of language(9)to enable a new level of semantic MT.Finally,just in the past year,we have seen thedevelopmentofanextremelypr

32、omisingapproachto MT through the use of deep-learningbasedsequence models.The central idea of deep learn-ing is that if we can train a model with severalrepresentational levels to optimize a final objec-tive,such as translation quality,then the modelcan itself learn intermediate representationsthat

33、are useful for the task at hand.This ideahas been explored particularly for neural net-work models in which information is stored inreal-valued vectors,with the mapping betweenvectors consisting of a matrix multiplication fol-lowed by a nonlinearity,such as a sigmoid func-tion that maps the output values of the matrixmultiplicationonto1,1.Buildinglargemodelsof this form is much more practical with

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1