ImageVerifierCode 换一换
格式:DOCX , 页数:8 ,大小:21.48KB ,
资源ID:22957800      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bdocx.com/down/22957800.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(1 Introduction to Corpus LinguisticsWord文档下载推荐.docx)为本站会员(b****8)主动上传,冰豆网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰豆网(发送邮件至service@bdocx.com或直接QQ联系客服),我们立即给予删除!

1 Introduction to Corpus LinguisticsWord文档下载推荐.docx

1、1.1 What is a corpus?In the language sciences a corpus is a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description. In many respects it is the use to which the body of textual material is put, rather than its design features, which define what a

2、 corpus is. A corpus constitutes an empirical basis not only for identifying the elements and structural patterns which make up the systems we use in a language, but also for mapping out our use of these systems. A corpus can be analyzed and compared with other corpora or parts of corpora to study v

3、ariation. Most importantly, it can be analyzed distributionally to show how often particular phonological, lexical, grammatical, discoursal or pragmatic features occur, and also where they occur. By the 1990s there were many corpus-making projects in various parts of the world. Lancashire (1991) sho

4、ws the huge range of corpora, archives and other electronic databases available or being compiled for a wide variety of purposes. Some of the largest corpus projects have been undertaken for commercial purposes, by dictionary publishers. Other projects in corpus compilation or analysis are on a smal

5、ler scale, and do not necessarily become well known. Undertaken as part of graduate theses or undergraduate projects, they enabled students to gain original insights into the structure and use of language.1. 2 Categorization of CorpusComputerized corpora consist of:Raw corpora (原始语料库),这就是将现实中的口语和笔语用

6、文字形式收集起来,按一定原则(语域,语体,历时,共时等)归类汇编起来的各种语料库。Annotated corpora (附码语料库),这是指对原始语料进行了词性、语法、语音、语义或语篇乃至语用标记附码的语料库Parallel corpora (平行语料库),这是指两种或多种语言在句子乃至单词短语层面上实现同步对译的互动语料库,如英法德西班牙等语种的平行语料库CRATER (McEnery & Oakes 1996)和英汉双语平行语料库 (中国外语教学研究中心基地 2000)等Learners corpora (学习者语料库), 即非母语学习者的口语和笔语语料库,其中包括注有学习者拼写和语法差错

7、标记以及修改提示的语料库。如ICLE (国际英语学习者书面语料库),LINDSEI (国际英语学习者口语语料库)(Granger 2000) 和 CLEC (中国英语学习者书面语料库)(桂诗春 2001)等等Lattice corpora (网格式语料库),这是指对自然语言 (包括口语和笔语)进行自动语音和手写识别处理之后声称的语料库 (Atwell 1996).总体说来,语料库分成原始语料库与附码语料库。1.3 What a corpus can do Strictly speaking, a corpus by itself can do nothing at all, being not

8、hing other than a store of used language. Corpus access software, however, can rearrange that store so that observations of various kinds can be made. If a corpus represents, very roughly and partially, a speakers experience of language, the access software re-orders that experience so that it can b

9、e reexamined in ways that are usually impossible. A corpus does not contain new information about language, but the software packages process data from a corpus in three ways: showing frequency, phraseology and collocation.2. What is corpus linguistics?2.1 The definition of corpus linguisticsOver th

10、e last three decades the compilation and analysis of corpora stored in computerized databases has led to a new scholarly enterprise known as corpus linguistics. It brings together some of the findings of corpus-based studies of English, the language which has so far received the most attention from

11、corpus linguists, and shows how quantitative analysis can contribute to linguistic description.2. 2 The history of corpus linguisticsThe use of corpus for linguistic studies can date back to the end of the nineteenth century when only cards and manual retrieval could be used as a means of research.A

12、s we have seen, corpus linguistics goes beyond the use of corpora as a source of evidence in linguistic description. It also revives and carries on a concern of some linguists with the statistical distribution of linguistic items in the context of use. From 1920s there was, especially in the United

13、States and the United Kingdom, a tradition of word counting in texts in order to discover the most frequent, and arguably therefore the most pedagogically useful, words and grammatical structures for language teaching purposes. From the 1930s, Prague School linguistics undertook quantitative studies

14、 (Mainly of Czech, English and Russian) of different parts of speech, the location and distribution of information in the sentence, and the statistical distribution of syllable types and structures. Different varieties of English have been studied. The earliest computerized corpora compiled for ling

15、uistic research from the 1960s required the use of mainframe computers, and researchers frequently had to design their own software for analysis. Initial interest was often in lexis, including word counts, but it was quickly apparent that a computer facilitated the study of permissible or likely wor

16、d sequences or collocations (are we more likely to write different from, different to or different than?) and grammatical and stylistic characteristics of particular authors and genres. There was a particular interest in what characterized scientific style, newspaper style and literary or imaginativ

17、e style. The renowned British scholar R. Greenbaum began to cooperate for the sake of establishing a corpus Survey of English Usage (SEU) in 1950s and 1960s, first on paper and then computerized at the beginning of the 1980s, which marks the transition from the traditional corpus to the computerized

18、 corpus. Brown University Standard Corpus of Present-day American English Corpus (BROWN) was established in the 1960s and 1970s. London-Lund Corpus of Spoken English (LLC) was accomplished in the 1980s, which was the first corpus of its kind, including formal and informal speeches, commentaries, dia

19、logues, discussions, interviews and so on. These three classic corpora lay a solid foundation for the present-day corpus linguistics, for they are based on systematically comprehensive, authentic and reliable corpora, and easy for storage and retrieval. 2. 3 The scope of corpus linguisticsCorpus lin

20、guistics is based on bodies of text as the domain of study and as the source of evidence for linguistic description and argumentation. It also has come to embody methodologies for linguistic description in which quantification of the linguistic items is part of the research activity. As Leech (1992:

21、107) has noted, the focus of study is on performance rather than on competence, and on observation of language in use leading to theory rather than vice versa.Corpus linguists are concerned typically not only with what words, structures or uses are possible in a language but also with what is probab

22、le what is likely to occur in language use. The use of corpus as a source of evidence however is not necessarily incompatible with any linguistic theory, and progress in the language sciences as a whole is likely to benefit from a judicious use of evidence from various sources: texts, introspection,

23、 elicitation or other types of experimentation as appropriate. Any scientific enterprise must be empirical in the sense it has to be supported or falsified on evidence and, in the final analysis, statements made about language have to stand up to the evidence of language use. The evidence can be bas

24、ed on the introspective judgment of speakers of the language or on a corpus of text. The difference lies in the richness of the evidence and the confidence we can have in the generalizability of that evidence, and in its validity and reliability.2. 4 Applications of corpus linguisticsCorpus linguist

25、ics can be widely exploited in a variety of domainsmost centrally in the design of syllabi and materials for language teaching, but also in dictionary work, the study of ideology and culture, translation, stylistics, forensic linguistics, and the provision of on-line assistance for writers in well-d

26、efined technical domains. 3. Types of corpus researchersWork in corpus linguistics is currently associated with several quite different activities. Scholars working in the field tend to be identified with one or more of them.The first group of researchers consists of corpus makers or compilers. Thes

27、e scholars are concerned with the design and compilation of corpora, the collection of texts and their preparation and storage for later analysis.A second group of researchers has been concerned with developing tools for the analysis of corpora. This is the main task of researchers in computational

28、linguistics.A third group of researchers consists of descriptive linguists whose main concern has been to make use of computerized corpora to describe reliably the lexicon and grammar of languages, both of the linguistic systems we use and our likely use of those systems. It is the probabilistic asp

29、ect of corpus-based descriptive linguistic studies which especially distinguishes them from conventional descriptive fieldwork in linguistics or lexicography.A fourth area of activity, which has been among the most innovative outcomes of the corpus revolution, has been the exploitation of corpus-bas

30、ed linguistic description for use in a variety of applications such as language learning and teaching, and natural language processing by machine, including speech recognition and translation. 4. The objective of offering this courseIt is my hope that this course will whet the appetites of the growi

31、ng body of teachers and students with access to corpora to discover more for themselves about how language works in all their variety.There is no doubt that corpus linguistics is not an end in itself but is one source of evidence for improving descriptions of the structure and use of languages, and

32、for various applications, including the processing of natural language by machine and understanding how to learn and teach a language.It should be made clear that corpus linguistics is not amindless process of automatic language description. Linguists use corpora to answer questions and solve problems. Some of the most revealing insights on language and language use have come from a blend of manual and computer analysis. It is now possible for researchers with access to a personal computer and off-shelf software

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1