1 Introduction to Corpus LinguisticsWord文档下载推荐.docx

上传人:b****8 文档编号:22957800 上传时间:2023-02-06 格式:DOCX 页数:8 大小:21.48KB
下载 相关 举报
1 Introduction to Corpus LinguisticsWord文档下载推荐.docx_第1页
第1页 / 共8页
1 Introduction to Corpus LinguisticsWord文档下载推荐.docx_第2页
第2页 / 共8页
1 Introduction to Corpus LinguisticsWord文档下载推荐.docx_第3页
第3页 / 共8页
1 Introduction to Corpus LinguisticsWord文档下载推荐.docx_第4页
第4页 / 共8页
1 Introduction to Corpus LinguisticsWord文档下载推荐.docx_第5页
第5页 / 共8页
点击查看更多>>
下载资源
资源描述

1 Introduction to Corpus LinguisticsWord文档下载推荐.docx

《1 Introduction to Corpus LinguisticsWord文档下载推荐.docx》由会员分享,可在线阅读,更多相关《1 Introduction to Corpus LinguisticsWord文档下载推荐.docx(8页珍藏版)》请在冰豆网上搜索。

1 Introduction to Corpus LinguisticsWord文档下载推荐.docx

1.1Whatisacorpus?

Inthelanguagesciencesacorpusisabodyofwrittentextortranscribedspeechwhichcanserveasabasisforlinguisticanalysisanddescription.Inmanyrespectsitistheusetowhichthebodyoftextualmaterialisput,ratherthanitsdesignfeatures,whichdefinewhatacorpusis.

Acorpusconstitutesanempiricalbasisnotonlyforidentifyingtheelementsandstructuralpatternswhichmakeupthesystemsweuseinalanguage,butalsoformappingoutouruseofthesesystems.Acorpuscanbeanalyzedandcomparedwithothercorporaorpartsofcorporatostudyvariation.Mostimportantly,itcanbeanalyzeddistributionallytoshowhowoftenparticularphonological,lexical,grammatical,discoursalorpragmaticfeaturesoccur,andalsowheretheyoccur.

Bythe1990sthereweremanycorpus-makingprojectsinvariouspartsoftheworld.Lancashire(1991)showsthehugerangeofcorpora,archivesandotherelectronicdatabasesavailableorbeingcompiledforawidevarietyofpurposes.Someofthelargestcorpusprojectshavebeenundertakenforcommercialpurposes,bydictionarypublishers.Otherprojectsincorpuscompilationoranalysisareonasmallerscale,anddonotnecessarilybecomewellknown.Undertakenaspartofgraduatethesesorundergraduateprojects,theyenabledstudentstogainoriginalinsightsintothestructureanduseoflanguage.

1.2CategorizationofCorpus

Computerizedcorporaconsistof:

Rawcorpora(原始语料库),这就是将现实中的口语和笔语用文字形式收集起来,按一定原则(语域,语体,历时,共时等)归类汇编起来的各种语料库。

Annotatedcorpora(附码语料库),这是指对原始语料进行了词性、语法、语音、语义或语篇乃至语用标记附码的语料库

Parallelcorpora(平行语料库),这是指两种或多种语言在句子乃至单词短语层面上实现同步对译的互动语料库,如英法德西班牙等语种的平行语料库CRATER(McEnery&

Oakes1996)和英汉双语平行语料库(中国外语教学研究中心基地2000)等

Learnerscorpora(学习者语料库),即非母语学习者的口语和笔语语料库,其中包括注有学习者拼写和语法差错标记以及修改提示的语料库。

如ICLE(国际英语学习者书面语料库),LINDSEI(国际英语学习者口语语料库)(Granger2000)和CLEC(中国英语学习者书面语料库)(桂诗春2001)等等

Latticecorpora(网格式语料库),这是指对自然语言(包括口语和笔语)进行自动语音和手写识别处理之后声称的语料库(Atwell1996).

总体说来,语料库分成原始语料库与附码语料库。

1.3Whatacorpuscando

Strictlyspeaking,acorpusbyitselfcandonothingatall,beingnothingotherthanastoreofusedlanguage.Corpusaccesssoftware,however,canrearrangethatstoresothatobservationsofvariouskindscanbemade.Ifacorpusrepresents,veryroughlyandpartially,aspeaker’sexperienceoflanguage,theaccesssoftwarere-ordersthatexperiencesothatitcanbereexaminedinwaysthatareusuallyimpossible.Acorpusdoesnotcontainnewinformationaboutlanguage,butthesoftwarepackagesprocessdatafromacorpusinthreeways:

showingfrequency,phraseologyandcollocation.

2.Whatiscorpuslinguistics?

2.1Thedefinitionofcorpuslinguistics

Overthelastthreedecadesthecompilationandanalysisofcorporastoredincomputerizeddatabaseshasledtoanewscholarlyenterpriseknownascorpuslinguistics.Itbringstogethersomeofthefindingsofcorpus-basedstudiesofEnglish,thelanguagewhichhassofarreceivedthemostattentionfromcorpuslinguists,andshowshowquantitativeanalysiscancontributetolinguisticdescription.

2.2Thehistoryofcorpuslinguistics

Theuseofcorpusforlinguisticstudiescandatebacktotheendofthenineteenthcenturywhenonlycardsandmanualretrievalcouldbeusedasameansofresearch.

Aswehaveseen,corpuslinguisticsgoesbeyondtheuseofcorporaasasourceofevidenceinlinguisticdescription.Italsorevivesandcarriesonaconcernofsomelinguistswiththestatisticaldistributionoflinguisticitemsinthecontextofuse.From1920stherewas,especiallyintheUnitedStatesandtheUnitedKingdom,atraditionofwordcountingintextsinordertodiscoverthemostfrequent,andarguablythereforethemostpedagogicallyuseful,wordsandgrammaticalstructuresforlanguageteachingpurposes.

Fromthe1930s,PragueSchoollinguisticsundertookquantitativestudies(MainlyofCzech,EnglishandRussian)ofdifferentpartsofspeech,thelocationanddistributionofinformationinthesentence,andthestatisticaldistributionofsyllabletypesandstructures.DifferentvarietiesofEnglishhavebeenstudied.

Theearliestcomputerizedcorporacompiledforlinguisticresearchfromthe1960srequiredtheuseofmainframecomputers,andresearchersfrequentlyhadtodesigntheirownsoftwareforanalysis.Initialinterestwasofteninlexis,includingwordcounts,butitwasquicklyapparentthatacomputerfacilitatedthestudyofpermissibleorlikelywordsequencesorcollocations(arewemorelikelytowritedifferentfrom,differenttoordifferentthan?

)andgrammaticalandstylisticcharacteristicsofparticularauthorsandgenres.Therewasaparticularinterestinwhatcharacterized‘scientificstyle’,‘newspaperstyle’and‘literaryorimaginativestyle’.TherenownedBritishscholarR.GreenbaumbegantocooperateforthesakeofestablishingacorpusSurveyofEnglishUsage(SEU)in1950sand1960s,firstonpaperandthencomputerizedatthebeginningofthe1980s,whichmarksthetransitionfromthetraditionalcorpustothecomputerizedcorpus.BrownUniversityStandardCorpusofPresent-dayAmericanEnglishCorpus(BROWN)wasestablishedinthe1960sand1970s.London-LundCorpusofSpokenEnglish(LLC)wasaccomplishedinthe1980s,whichwasthefirstcorpusofitskind,includingformalandinformalspeeches,commentaries,dialogues,discussions,interviewsandsoon.Thesethreeclassiccorporalayasolidfoundationforthepresent-daycorpuslinguistics,fortheyarebasedonsystematicallycomprehensive,authenticandreliablecorpora,andeasyforstorageandretrieval.

2.3Thescopeofcorpuslinguistics

Corpuslinguisticsisbasedonbodiesoftextasthedomainofstudyandasthesourceofevidenceforlinguisticdescriptionandargumentation.Italsohascometoembodymethodologiesforlinguisticdescriptioninwhichquantificationofthelinguisticitemsispartoftheresearchactivity.AsLeech(1992:

107)hasnoted,thefocusofstudyisonperformanceratherthanoncompetence,andonobservationoflanguageinuseleadingtotheoryratherthanviceversa.

Corpuslinguistsareconcernedtypicallynotonlywithwhatwords,structuresorusesarepossibleinalanguagebutalsowithwhatisprobable–whatislikelytooccurinlanguageuse.Theuseofcorpusasasourceofevidencehoweverisnotnecessarilyincompatiblewithanylinguistictheory,andprogressinthelanguagesciencesasawholeislikelytobenefitfromajudicioususeofevidencefromvarioussources:

texts,introspection,elicitationorothertypesofexperimentationasappropriate.Anyscientificenterprisemustbeempiricalinthesenseithastobesupportedorfalsifiedonevidenceand,inthefinalanalysis,statementsmadeaboutlanguagehavetostanduptotheevidenceoflanguageuse.Theevidencecanbebasedontheintrospectivejudgmentofspeakersofthelanguageoronacorpusoftext.Thedifferenceliesintherichnessoftheevidenceandtheconfidencewecanhaveinthegeneralizabilityofthatevidence,andinitsvalidityandreliability.

2.4Applicationsofcorpuslinguistics

Corpuslinguisticscanbewidelyexploitedinavarietyofdomains—mostcentrallyinthedesignofsyllabiandmaterialsforlanguageteaching,butalsoindictionarywork,thestudyofideologyandculture,translation,stylistics,forensiclinguistics,andtheprovisionofon-lineassistanceforwritersinwell-definedtechnicaldomains.

3.Typesofcorpusresearchers

Workincorpuslinguisticsiscurrentlyassociatedwithseveralquitedifferentactivities.Scholarsworkinginthefieldtendtobeidentifiedwithoneormoreofthem.

Thefirstgroupofresearchersconsistsofcorpusmakersorcompilers.Thesescholarsareconcernedwiththedesignandcompilationofcorpora,thecollectionoftextsandtheirpreparationandstorageforlateranalysis.

Asecondgroupofresearchershasbeenconcernedwithdevelopingtoolsfortheanalysisofcorpora.Thisisthemaintaskofresearchersincomputationallinguistics.

Athirdgroupofresearchersconsistsofdescriptivelinguistswhosemainconcernhasbeentomakeuseofcomputerizedcorporatodescribereliablythelexiconandgrammaroflanguages,bothofthelinguisticsystemsweuseandourlikelyuseofthosesystems.Itistheprobabilisticaspectofcorpus-baseddescriptivelinguisticstudieswhichespeciallydistinguishesthemfromconventionaldescriptivefieldworkinlinguisticsorlexicography.

Afourthareaofactivity,whichhasbeenamongthemostinnovativeoutcomesofthecorpusrevolution,hasbeentheexploitationofcorpus-basedlinguisticdescriptionforuseinavarietyofapplicationssuchaslanguagelearningandteaching,andnaturallanguageprocessingbymachine,includingspeechrecognitionandtranslation.

4.Theobjectiveofofferingthiscourse

Itismyhopethatthiscoursewillwhettheappetitesofthegrowingbodyofteachersandstudentswithaccesstocorporatodiscovermoreforthemselvesabouthowlanguageworksinalltheirvariety.

Thereisnodoubtthatcorpuslinguisticsisnotanendinitselfbutisonesourceofevidenceforimprovingdescriptionsofthestructureanduseoflanguages,andforvariousapplications,includingtheprocessingofnaturallanguagebymachineandunderstandinghowtolearnandteachalanguage.

Itshouldbemadeclearthatcorpuslinguisticsisnota

mindlessprocessofautomaticlanguagedescription.Linguistsusecorporatoanswerquestionsandsolveproblems.Someofthemostrevealinginsightsonlanguageandlanguageusehavecomefromablendofmanualandcomputeranalysis.Itisnowpossibleforresearcherswithaccesstoapersonalcomputerandoff-shelfsoftware

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 求职职场 > 简历

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1