中国英语学习者语料库CLEC桂诗春杨惠中.docx
《中国英语学习者语料库CLEC桂诗春杨惠中.docx》由会员分享,可在线阅读,更多相关《中国英语学习者语料库CLEC桂诗春杨惠中.docx(22页珍藏版)》请在冰豆网上搜索。
中国英语学习者语料库CLEC桂诗春杨惠中
中国英语学习者语料库
CLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。
其目的就是观察各类学生的英语特征和言语失误的情况,希望通过定量和定性的方法对中国学习者英语作出较为精确的描写,为我国学生的英语教学提供有用的反馈信息。
表1CLEC语料分布
类型
词次
ST2
208088
ST3
209043
ST4
212855
ST5
214510
ST6
226106
总计
1070602
言语失误标注原则
1. 简单合理,易于系统操作。
参与标注的人比较多,分类表过于繁复,就难于掌握。
我们采取两级分类,第一级有11类:
词形(fm)、动词短语(vp)、名词短语(np)、代词(pr)、形容词短语(aj)、副词(ad)、介词短语(pp)、连词(cj)、词汇(wd)、搭配(cc)、句子(sn)。
每一类里再用数目字细分。
如[cc]为词语搭配不当,[cc1]表示名词和名词的搭配,[cc2]表示名词和动词的搭配,[cc3]表示动词和名词的搭配,等等。
2. 分类表的类别要适中。
过粗容易统一,但信息太少,不利于分析学习者的失误/过细难以统一,容易把同一种失误归到不同类别。
目前我们采取的办法是对常见的失误从细(如vp和np都有9小类),对少见的失误从粗(如cj只有两小类)。
现在的分类表有61个失误码,是属于中等规模的分类表。
3. 提供足够的失误信息(失误本身、失误类型和失误发生范围)。
例如Inthepast,peopleare[vp6,4-]kindtoeachother…,失误用方括号表示,放在失误之后。
[vp6]Inthepast,peopleare[vp6,4-]kindtoeachother……,[vp6,4-]are为vp(动词)第6种(时态)失误,4-为失误发生的范围,-表示失误的位置,4表示失误前有4个词。
要联系这4个词,才能判断areare这个词用错了。
4. 开放性。
容许研究者根据需要对失误类型进行补充或进一步再分出细类。
例如[sn8]为句子结构有缺陷,研究者可以对这种失误再分为若干细类来研究。
这需要把sn8的失误全部检索出来,然后定出第三级的分类范畴,如sn81,sn82,等等。
5. 对语体或失误的来由暂不作标注,因为这需要标注者较多的主观判断,更难以统一。
言语失误分类表(总数:
61)
词形
动词短语
名词短语
代词
码
类型
码
类型
码
类型
码
类型
fm1
Spelling
vp1
pattern
np1
pattern
pr1
Reference
fm2
wordbuilding
vp2
setphrase
np2
setphrase
pr2
anticipatoryit
fm3
capitalization
vp3
agreement
np3
agreement
pr3
Agreement
vp4
finite/non-finite
np4
case
pr4
Case
vp5
non-finite
np5
countability
pr5
wh-
vp6
tense
np6
number
pr6
Indefinite
vp7
voice
np7
article
vp8
mood
np8
quantifiers
vp9
modal/auxiliary
np9
otherdeterminers
形容词短语
副词
介词短语
连词
码
类型
码
类型
码
类型
码
类型
aj1
pattern
ad1
order
pp1
pattern
cj1
pattern
aj2
setphrase
ad2
modification
pp2
setphrase
cj2
setphrase
aj3
degree
ad3
degree
aj4
-ed/-ingconfusion
aj5
predicative/attributive
词语
搭配
句子
码
类型
码
类型
码
类型
wd1
order
cc1
noun/noun
sn1
run-onsentence
wd2
partofspeech
cc2
noun/verb
sn2
sentencefragment
wd3
substitution
cc3
verb/noun
sn3
danglingmodifier
wd4
absence
cc4
adj/noun
sn4
illogicalcomparison
wd5
redundancy
cc5
verb/adv
sn5
topicprominence
wd6
repetition
cc6
adv/adj
sn6
Coordination
wd7
ambiguity
sn7
Subordination
sn8
structuraldeficiency
sn9
Punctuation
标注说明
码
分类
类别
说明
fm1
word
Spelling(拼写)
spelling,coinage,abbreviation,apostrophe
fm2
word
wordbuilding(构词)
derivation,inflection,compounding,plurality(noun),irregularity(verb),3rdpersonsingularform(verb),syllabification,hyphenation,worddivisionorfusion
fm3
word
Capitalization(大小写)
lowerinitialletterforupperinitialletterorviceversa
vp1
vbphr
Pattern(及物性型式)
errorintransitivity(viasvtorviceversa),transitiveverbpattern/grammatical(cfOxfordadvancedlearner’sdictionaryofcurrentEnglisheditedbyA.S.Hornby)
vp2
vbphr
setphrase(固定词组)
phrasalverbandverbalphrase:
errorinformoruse
vp3
vbphr
Agreement(主谓一致性)
numberagreementwithitssubject(nounorpronoun)
vp4
vbphr
finite/non-finite(定式)
finiteverbfornon-finiteverborviceversa
vp5
vbphr
non-finite(不定式)
infinitiveerror:
formanduse/infinitiveforparticipleorviceversa/-edparticiplefor-ingparticipleorviceversa
vp6
vbphr
Tense(时态)
errorintenseusewithinasentence/thesequenceoftensesbetweensentences
vp7
vbphr
voice(语态)
errorintheuseofvoice:
activeforpassiveorviceversa
vp8
vbphr
Mood(语气)
errorintheuseofmood:
imperative,subjunctive/improperstructureofconditionalsentences
vp9
vbphr
modal/auxiliary(情态)
misuseofmodal/auxiliaryverbs/wrongformofmodalverb(orauxiliaryverb)andverbcombination(e.gtenseform,voiceform,etc)
np1
nnphr
Pattern(名词型式)
Errorincombinationwithotherwords/grammatical
np2
nnphr
setphrase(固定词组)
omissionorreplacementofafixedelementthatgoesafteracertainnoun
np3
nnphr
Agreement(主谓一致性)
numberagreementofanounwithitsdeterminerorawordthatreferstoit
np4
nnphr
Case(格)
possessivecaseerror:
formoruse
np5
nnphr
Countability(可数性)
uncountablenounusedascountablenoun
np6
nnphr
Number(数)
countablenounusedwithnodetermineror-s/aor-swithpluralnoun
np7
nnphr
Article(冠词)
a/anconfusionordefinite/indefiniteconfusion
np8
nnphr
Quantifiers(数量词)
misuseorconfusionbetweenmany/much,(a)few/(a)little,some/any,etc
np9
nnphr
otherdeterminers(其他限定词)
misuseorconfusionofdemonstratives,wh-determiners,numerals,etc.
pr1
pron
Reference(指称)
incorrect/ambiguouspronounreference/anaphoric
pr2
pron
anticipatoryit(先行it)
improperorwronguseofanticipatoryit/itreplacedbyademonstrative,etc
pr3
pron
Agreement(主谓一致性)
numberagreementwithanounitrefersto
pr4
pron
Case(格)
caseerrorofanypersonalpronoun
pr5
pron
wh-(wh-代词)
misuseorconfusionofinterrogative,relativeandconjunctivepronouns
pr6
pron
Indefinite(不定式)
misuseorconfusionofindefinitepronounssuchasall/both,few/little,some/any,either/neither,etc
aj1
adj
Pattern(形容词型式)
errorinthecombinationwithotherwords/grammatical
aj2
adj
setphrase(固定词组)
errorintheidiomaticuseofanadjectivalphrase/omissionorreplacementofafixedelementthatgoesafteracertainadjective
aj3
adj
Degree(级)
adjectivedegreeerror:
formanduse
aj4
adj
-ed/-ingconfusion(-ed/-ing混淆)
-edadjectivefor-ingadjectiveorviceversa
aj5
adj
predicative/attributive(谓语/定语)
predicativeadjectiveusedasattributiveadjective
ad1
adv
Order(词序)
improperadverbplacement/wrongposition
ad2
adv
Modification(修饰语)
adjectivemodifierusedasverbmodifier/otherkindsofconfusion
ad3
adv
Degree(级)
adverbdegreeerror:
formanduse
pp1
prep
Pattern(介词型式)
unacceptablecombinationwithotherwords/grammatical
pp2
prep
setphrase(固定词组)
errorintheformationoruseofanidiomaticprepositionalphrase
cj1
conj
Pattern(连词型式)
unacceptablecombinationwithotherwords/grammatical
cj2
conj
setphrase(固定词组)
errorintheformationoruseofaphrasefunctioningasaconjunction
wd1
word
Order(词序)
misplacementofanywordotherthananadverb
wd2
word
partofspeech(词类)
errorinpartofspeech:
rightrootbutwrongwordclass
wd3
word
Substitution(替代)
errorinwordchoice:
rightwordclassbutwrongselection(anypartofspeech)
wd4
word
Absence(缺少)
omissionofaword(anypartofspeech)
wd5
word
Redundancy(冗余)
oversupplianceofaword(anypartofspeech)
wd6
word
Repetition(重复)
unnecessaryrepeatingofaword
wd7
word
Ambiguity(歧义)
notclearwordmeaning/semantic
cc1
notional
n/ncollocation(名词/名词)
impropernoun(phrase)andnoun(phrase)combination/semantic
cc2
notional
n/vcollocation(名词/动词)
impropernoun(phrase)andverb(phrase)combination/semantic
cc3
notional
v/ncollocation(动词/名词)
improperverbandnoun(phrase)combination/semantic
cc4
notional
a/ncollocation(形容词/名词)
improperadjectiveandnoun(phrase)combination/semantic
cc5
notional
v/adcollocation(动词/副词)
improperverbandadverb(orad/v)combination/semantic
cc6
notional
ad/acollocation(副词/形容词)
improperadverbandadjectivecombination/semantic
sn1
sentence
run-onsentence(不断句)
improperadditionofclauses/fusedsentence
sn2
sentence
sentencefragment(片段)
subordinateclauseasasentence/anyphraseasasentence
sn3
sentence
danglingmodifier(垂悬修饰语)
illogicaladverbialmodificationofaclause
sn4
sentence
illogicalcomparison(比较不符合逻辑)
errorinthecomparisonofwordsorphrasesinasentencewhichcannotbecompared
sn5
sentence
topicprominence(主题突出)
theco-occurrenceofaninitialnounphraseanditsequivalent(usuallyapronoun)inthesamesentence
sn6
sentence
Coordination(并列)
faultyparallelismofclauses(orwords/phrases)inasentence
sn7
sentence
Subordination(主从)
faultyattachmentofasubordinateclausetothemainclause
sn8
sentence
structuraldeficiency(结构缺陷)
errorinthegrammaticalconstructionofasentence:
impropersplitting,patternshifting,confusingstructure,etc
sn9
sentence
Punctuation(标点符号)
overuse,absence,choice,apostrophe,commasplice,etc.
标准化处理后的各种失误频数及其比例
失误类型
st2
st3
st3
st4
st5
总计
百分比(%)
fm1
1928.8
2877.4
2112.6
1826.7
1686.7
10432.2
17.47
fm2
349.3
448.9
438.9
226.9
328.7
1792.7
3
fm3
1474.4
731.8
405.8
694.1
174.6
3480.7
5.83
vp1
259.4
325.9
498.4
103.4
200.8
1387.9
2.32
vp2
179
139.3
61.2
104.2
22.1
505.8
0.85
vp3
374
524.6
785.2
273.1
327
2283.9
3.82
vp4
140.8
159.1
110.8
63.9
51.6
526.2
0.88
vp5
140
118.7
107.4
89.9
46.7
502.7
0.84
vp6
1165.7
356
311.6
379.8
215.6
2428.7
4.07
vp7
172.7
104.1
98.4
63.9
46.7
485.8
0.81
vp8
27.1
16.3
8.3
25.2
11.5
88.4
0.15
vp9
111.4
274.3
278.5
42.9
86.1
793.2
1.33
np1
46.9
33.5
28.9
16.8
10.7
136.8
0.23
np2
24.7
22.4
17.4
19.3
2.5
86.3
0.14
np3
202.1
247.7
249.6
210.9
186
1096.3
1.84
np4
66.8
55.9
26.4
22.7
21.3
193.1
0.32
np5
58.9
98
71.9
60.5
84.4
373.7
0.63
np6
374
654.4
481
358.8
354.1
2222.3
3.72
np7
237.9
107.5
89.3
174.8
54.9
664.4
1.11
np8
35
65.4
47.9
1