英语教育测量与评价.docx
《英语教育测量与评价.docx》由会员分享,可在线阅读,更多相关《英语教育测量与评价.docx(10页珍藏版)》请在冰豆网上搜索。
英语教育测量与评价
研究生学位课程试卷
院(系、所)外国语学院专业英语
考试科目英语教育测量与评价第二学期
研究生姓名戎竞雄学号132300176
考试成绩
导师评语:
导师签字
年月日
说明
一、凡学位课程考试试题、试卷必须与本封面一起装订。
阅卷导师务必用红笔批卷,并在本封面规定位置打分、写完评语后在二周(论文考试一个月)内交院(系、所)办公室教务员,教务员及时做好成绩登记,在学期结束前或第二学期初将成绩单交研究生处统一整理归档。
试题、试卷由院(系、所)办公室保管。
二、学位课程考试用纸除计算机专用打字纸、16开小方格稿子纸外,一律使用研究生处统一印制的“学位课程考试纸”。
三、该封面请用A4纸双面打印,将此说明打印于封面背面。
上海师范大学标准试卷
2013~2014学年第二学期考试日期2014年8月日
科目:
英语教育测量与评价
学科教学+课程与教学论专业硕士13年级姓名戎竞雄学号_132300176__
项目
一
二
三
四
五
六
七
八
总分
分值
45
55
100
得分
我承诺,遵守《上海师范大学考场规则》,诚信考试。
签名:
__戎竞雄_____
I.Answerindetailthefollowingquestions:
(45%,15pointsforeach)
1.Suggestthedifferencesbetweenproficiencytestsandachievementtests.Giveexamplesifnecessary.
Answer:
Aproficiencytestassessesthegeneralknowledgeorskillscommonlyrequiredforentryintoagroupofsimilarinstitutions.OneexampleisTOEFL.Proficiencytestsarenorm-referencedtestsbecauseNRTshaveallthequalitiesdesirableforproficiencydecisions.Whileanachievementtestmustbedesignedwithveryspecificreferencetoaparticularcourse.Andtheachievementtestsareoftendirectlybasedoncourseobjectivesandwillthereforebecriterion-referenced.Suchtestswilltypicallybeadministeredattheendofacoursetodeterminehoweffectivelystudentshavemasteredtheinstructionalobjectives.Achievementtestsmustbenotonlyveryspecificallydesignedtomeasuretheobjectivesofagivencoursebutalsoflexibleenoughtohelpteachersreadilyrespondtowhattheylearnfromthetestaboutthestudents'abilities,thestudents'needs,andthestudents'learningofthecourseobjectives.Oneexampleisthetestsattheendofthecourse.
2.Thefollowingaretwodifferentkindsofscoredistribution.
.
Whatinformationdothesetwofiguresconveyus?
(DiscussfromthescoredistributionofNRTandthatofCRT)
Answer:
Thefirstfigureconveysusthatitisanorm-referencedtest,whichisdesignedtomeasuregloballanguageabilities(forinstance,overallEnglishlanguageproficiencyincludinglisteningability,readingcomprehension,andsoon).Eachstudent'sscoreonsuchatestisinterpretedrelativetothescoresofallotherstudentswhotookthetest.Suchcomparisonsareusuallydonewithreferencetotheconceptofthenormaldistribution(familiarlyknownasthebellcurve).ThepurposeofanNRTistospreadstudentsoutalongacontinuumofscoressothatthosewithpoorlanguageabilitiesareatoneendofthenormaldistribution,whilethosewith"high"abilitiesareattheotherend(withthebulkofthestudentsfallingnearthemiddle).
Thesecondfigureshowsitisacriterion-referencedtest(CRT),whichisusuallyproducedtomeasurewell-definedandfairlyspecificobjectives.Oftentheseobjectivesarespecifictoaparticularcourseorprogram.Eachstudent'sscoreismeaningfulwithoutreferencetotheotherstudents'scores.Astudent'sscoreonaparticularobjectiveindicatesthepercentoftheknowledgeorskillinthatobjectivethatthestudenthaslearned.
Moreover,thedistributionofscoresonaCRTneednotnecessarilybenormal.Ifallthestudentsknow100%ofthematerialonalltheobjectives,thenallthestudentsshouldreceivethesamescore.
ThepurposeofaCRTistomeasuretheamountoflearningthatastudenthasaccomplishedoneachobjective.Inmostcases,thestudentswouldknowinadvancewhattypesofquestions,tasks,andcontenttoexpectforeachobjective.
3.Whatisreliabilityandvalidity?
Whatistherelationshipbetweenreliabilityandvalidity?
Toassessacandidate’sorallanguageabilityinanoraltest,theexaminingbodyoftenaskstwoexaminerstoscorethatcandidate’sperformance.Similarly,whenanexaminerisgradingacompositionforacertaintest,i.e.TEM4,thesamecompositioncanbemarkedbythesameexaminerontwooccasions.Explainindetailwhysuchmeasuresshouldbetaken.
Answer:
Thetestreliabilityisdefinedastheextenttowhichtheresultscanbeconsideredconsistentorstable.Testvalidityisdefinedhereasthedegreetowhichatestmeasureswhatitclaims,orpurports,tobemeasuring.
Ifteachersadministeraplacementtesttotheirstudentsononeoccasion,theywouldlikethescorestobeverymuchthesameiftheyweretoadministerthesametestagain.Thedegreetowhichatestisconsistent,orreliablecanbeestimatedbycalculatingareliabilitycoefficient,whichcangoashighas+1.0foraperfectlyreliabletestoraslowas0whentheresultsonthetestaretotallyunreliable.Oncethetestsareadministeredtwiceandthepairsofscoresforeachstudentarelinedup,simplycalculateaPearsonproduct-momentcorrelationcoefficientbetweenthetwosetsofscores.Thecorrelationcoefficientwillprovideaconservativeestimate(thatisunderestimate)ofthereliabilityofthetestovertime.Thisreliabilityestimatecanbeinterpretedasthepercentofreliablevarianceonthetest.
Testvalidityisdefinedhereasthedegreetowhichatestmeasureswhatitclaims,orpurports,tobemeasuring.Forexample,ifatestclaimstomeasureproficiencyinGermanlisteningcomprehension,thatisjustwhatitshouldassess.
II.Discussion(55%,17pointsfor1and2,21pointsfor3)
1.Lookatthefollowingtableandanswerthequestionsthatfollow:
1)Calculatethetotalstandardscoresforthetwostudents
2)Comparethetotalstandardscoresbetweenthetwostudents,seewhichstudentscoredhigher,andexplainbrieflywhyateacherhadbetterusethetotalstandardscoresinsteadofthetotalrawscores.
Subject
Mean
SD
StudentA
StudentB
Psychology
81
6
85
80
Writing
85
9
80
91
ListeningComprehension
70
5
76
85
Reading
74
10
93
66
Literature
88
3
90
95
Total
424
417
Answer:
1)
Subject
Mean
SD
StudentA
Standard
scores
StudentB
Standardscores
Psychology
81
6
85
0.667
80
-0.17
Writing
85
9
80
-0.556
91
0.667
ListeningComprehension
70
5
76
1.2
85
3
Reading
74
10
93
1.9
66
-0.8
Literature
88
3
90
0.667
95
2.33
Total
424
3.88
417
5.03
2)从上图所计算出的标准分可以看出虽然学生A的总分比学生B的总分高,但
学生B的考试得分其中三项的标准分显然比学生A要高,也就是大部分的标准分比学生A要高。
从标准分总分来看,学生A的标准分总分为3.88,学生B的标准分总分为5.03,高于学生A。
所以学生B考得好些。
不同考试是不同质的,把不同考试的分数求和是没有意义的,同时也难以真实反映学生的整体情况。
所以正确的做法是先将各门课程的考分转换成标准分,再求和,标准分求和后的总分加以比较,从而判断学生A,B在考试中的优劣,这样才更科学。
2.UseSPSS16.0
(1)tocalculatethecorrelationcoefficientbetweentwosetsofwritingscoresmarkedbytwoteachersforthesamegroupofstudentsandthenmakethescatterplot.Discussifthereisanycorrelationbetweenthesetwosetsofscores.
(2)todothedependentsampleTtesttoseewhetherthereisanysignificantdifferencebetweenthetwoteachers’markingofthesamepaper.Reportanddiscusstheresult.Ifthereexistssignificantdifference,givesuggestionsastohowtosolvethisproblem.
(3)tocalculatethecorrelationcoefficientbetweenthreesetsofwritingscoresmarkedbythreeteachersforthesamegroupofstudentsandmakethescatterplotbetweenRater1andRater2;Rater1andRater3,andRater2andRater3.Discussifthereisanycorrelationbetweenthesethreesetsofscores.
(4)todoANOVAtoseewhetherthereisanysignificantdifferenceamongthethreeteachers’markingofthesamepaper(useLSDandSNK,anddrawMeansPlot).Reportanddiscusstheresult.
(1)
相关性
A组教师
B组教师
A组教师
Pearson相关性
1
.321
显著性(双侧)
.073
N
32
32
B组教师
Pearson相关性
.321
1
显著性(双侧)
.073
N
32
32
从以上的相关性显著性值0.073,大于0.05,以及散点图,可以看出两位教师所批改的写作分数无相关。
(2)
成对样本检验
成对差分
t
df
Sig.(双侧)
均值
标准差
均值的标准误
差分的95%置信区间
下限
上限
对1
A组教师-B组教师
-3.250
8.316
1.470
-6.248
-.252
-2.211
31
.035
从上表成对样本检验中的Sig.(双侧)值0.035,小于0.05,可以看出两位老师所打的分,有明显差异。
可以通过算标准分来解决此问题。
(3)
相关性
Rater1
Rater2
Rater3
Rater1
Pearson相关性
1
.632**
.599**
显著性(双侧)
.000
.000
N
55
55
55
Rater2
Pearson相关性
.632**
1
.652**
显著性(双侧)
.000
.000
N
55
55
55
Rater3
Pearson相关性
.599**
.652**
1
显著性(双侧)
.000
.000
N
55
55
55
**.在.01水平(双侧)上显著相关。
从以上的相关值和散点图可以看出,Rater1andRater2的相关度不高;Rater1andRater3属于正相关;Rater2andRater3属于较强的正相关。
(4)
分数
组别
N
alpha=0.05的子集
1
Student-Newman-Keulsa
3
55
68.95
1
55
69.40
2
55
71.75
显著性
.302
将显示同类子集中的组均值。
a.将使用调和均值样本大小=55.000。
从以上表格中alpha的值为0.302,大于0.005,可以看出三个数值没有显著差异,也就是三位老师的打分没有明显差异。
3.Ateacherwantstoapplyanewteachingmethodinoneofthetwoclassesheisteachingandtoseewhetherthismethodcanbeeffective.Beforehebeginshisexperiment,heusestheT-TestofSPSStocomparethescoresofthemid-termexaminationbetweenthetwoclassesandhefindsthesignificancevalueis0.547.Then,headoptsthenewmethodinEC(experimentclass)andforCC(comparisonclass),hestillusestheoldmethod.Afterthetwomonthsofexperiment,theteacherusestheT-TestofSPSSagaintocomparethescoresofthefinalexaminationbetweenthetwoclassesandthistime,hefindsthesignificancevalueis0.03.DiscusstheimportanceofthetwosignificancevaluesfromtwoT-TestofSPSS.
答:
从这位老师对实验班和对照班所做的期中考试的T检验的显著性水平值为0.547,大于0.05,可以看出前侧两个班无差异;两个月后期末考试的T检验的显著性水平值为0.03,小于0.05,可以看出后侧两个班有差异,由此,可以看出这位教师所采用的新方法对学生是有效的。