完整word版语言测试学资料 2word文档良心出品.docx
《完整word版语言测试学资料 2word文档良心出品.docx》由会员分享,可在线阅读,更多相关《完整word版语言测试学资料 2word文档良心出品.docx(8页珍藏版)》请在冰豆网上搜索。
完整word版语言测试学资料2word文档良心出品
Chapter2
(第二章)
TheValidityofLanguageTesting
(语言测试的效度)
Whatisvalidity?
Atestissaidtobevalidifitmeasuresaccuratelywhatitisintendedtomeasure.
Validityhasanumberofaspects:
•Contentvalidity内容效度
•Criterion-relatedvalidity标准相关效度
•Constructvalidity编制效度
•Facevalidity表面效度
•Theuseofvalidity效度的用途
ContentValidity:
Atestissaidtohavecontentvalidityifitscontentconstitutesarepresentativesampleofthelanguageskills,structures,etc.withwhichitismeanttobeconcerned.
e.g.
Agrammartestmustbemadeupofitemstestingknowledgeorcontrolofgrammar.
Butthisinitselfdoesnotensurecontentvalidity.Thetestwouldhavecontentvalidityonlyifitincludedapropersampleoftherelevantstructures.
Whataretherelevantstructureswilldependuponthepurposeofthetest.
Inordertojudgewhetherornotatesthascontentvalidity,weneedaspecificationoftheskillsorstructuresetc.thatitismeanttocover.
Suchaspecificationshouldbemadeataveryearlystageintestconstruction.
Itisn’ttobeexpectedthateverythinginthespecificationwillalwaysappearinthetest;theremaysimplybetoomanythingsforallofthemtoappearinssingletest.
Butitwillprovidethetestconstructorwiththebasisformakingaprincipledselectionofelementsforinclusioninthetest.
Acomparisonoftestspecificationandtestcontentisthebasisforjudgmentsastocontentvalidity.Ideallythesejudgmentsshouldbemadebypeoplewhoarefamiliarwithlanguageteachingandtestingbutwhoarenotdirectlyconcernedwiththeproductionofthetestinquestion.
Whatistheimportanceofcontentvalidity?
First,thegreateratest’scontentvalidity,themorelikelyitistobeanaccuratemeasureofwhatitissupposedtomeasure.
Atestinwhichmajorareasidentifiedinthespecificationareunder-represented-----ornotrepresentedatall-----isunlikelytobeaccurate.
Secondly,suchatestislikelytohaveaharmfulbackwasheffect.Areaswhicharenottestedarelikelytobecomeareasignoredinteachingandlearning.Toooftenthecontentoftestsisdeterminedbywhatiseasytotestratherthanwhatisimportanttotest.
Thetestsafeguardagainstthisistowritefulltestspecificationsandtoensurethatthetestcontentisafairreflectionofthese.
Discussion
Case1
Doyouthinkanachievementtestforintermediatelearnerstocontainjustthesamesetofstructuresasoneforadvancedlearnershascontentvalidity?
No.
Case2
About20yearsago,thecandidatesofuniversityentranceexaminationinAmericawasgivenacompositiontopic:
Isphotographyanartorscience?
Discuss.
Doyouthinkthistesthasvalidity?
No.
Case3
Theintentionofotherpeopleconcerned,suchastheMinisterofDefense,toinfluencethegovernmentleaderstoadapttheirpolicytofitinwiththedemandsoftherightwing,cannotbeignored.
Whatisthesubjectof“cannotbeignored”?
A.theintention
B.otherpeopleconcerned
C.theMinisterofDefense
D.thedemandsoftherightwing
Whatdoesthisitemwanttomeasure,readingcomprehensionorsentencestructure?
Criterion-relatedvalidity:
Anotherapproachtotestvalidityistoseehowfarresultsonthetestagreewiththoseprovidedbysomeindependentandhighlydependableassessmentofthecandidate’sability.Thisindependentassessmentisthusthecriterionmeasureagainstwhichthetestisvalidated.
Thereareessentiallytwokindsofcriterion-relatedvalidity:
concurrentvalidity(共时效度)andpredictivevalidity(预时效度).
Whatisconcurrentvalidity?
Concurrentvalidityisestablishedwhenthetestandthecriterionareadministeredataboutthesametime.
e.g.
Courseobjectivescallforanoralcomponentaspartofthefinalachievementtest.
Theobjectivesmaylistalargenumberof‘functions’whichstudentsareexpectedtoperformorally,totestallofwhichmighttake45minutesforeachstudent.Thiscouldwellbeimpractical.
Perhapsitisfeltthatonlytenminutescanbedevotedtoeachstudentfortheoralcomponent.
Thequestionthenarises:
Cansuchaten-minutesessiongiveasufficientlyaccurateestimateofthestudentsabilitywithrespecttothefunctionsspecifiedinthecourseobjectives?
Isitavalidmeasure?
Fromthepointofviewofcontentvalidity,thiswilldependonhowmanyofthefunctionsaretestedinthecomponent,andhowrepresentativetheyareofthecompletesetoffunctionsincludedintheobjectives.
Everyeffortshouldbemadewhendesigningtheoralcomponenttogiveitcontentvalidity.Oncethishasbeendone,however,wecangofurther.Wecanattempttoestablishtheconcurrentvalidityofthecomponent.
Howtodoit?
Weshouldchooseatrandomasampleofallthestudentstakingthetest.
Thesestudentswouldthenbesubjectedtothefull45minuteoralcomponentnecessaryforcoverageofallthefunctions,usingperhapsfourscorerstoensurereliablescoring.
Thiswouldbethecriteriontestagainstwhichtheshortertestwouldbejudged.
Thestudents’scoresonthefulltestwouldbecomparedwiththeonestheyobtainedontheten-minutesession,whichwouldhavebeenconductedandscoredintheusualway,withoutknowledgeoftheirperformanceonthelongerversion.
Ifthecomparisonbetweenthetwosetsofscoresrevealsahighlevelofagreement,thentheshorterversionoforalcomponentmaybeconsideredvalid,inasmuchasitgivesresultssimilartothoseobtainedwiththelongerversion.
If,ontheotherhand,thetwosetsofscoresshowlittleagreement,theshorterversioncannotbeconsideredvalid;itcannotbeusedasadependablemeasureofachievementwithrespecttothefunctionsspecifiedintheobjectives.
Ofcourse,iftenminutesreallyisallthatcanbesparedforeachstudent,thentheoralcomponentmaybeincludedforthecontributionthatitmakestotheassessmentofstudents’overallachievementandforitsbackwasheffect.Butitcannotberegardedasanaccuratemeasureinitself.
‘ahighlevelofagreement’
‘littleagreement’
Howisthelevelofagreementmeasured?
Standardproceduresforcomparingsetsofscores:
‘validitycoefficient’
amathematicalmeasureofsimilarity
Perfectagreementbetweentwosetsofscoreswillresultinavaliditycoefficientof1.
Totallackofagreementwillgiveacoefficientofzero.
Itisbesttosquarethatcoefficient.
acoefficientof0.7betweenthetwooraltests
Squared
0.49
convertedtoapercentage,
49percent
Onthebasisofthis,wecansaythatthescoresontheshorttestpredict49percentofthevariationinscoresonthelongertest.
Inbroadterms,thereisalmost50percentagreementbetweenonesetofscoresandtheother.
Acoefficientof0.5wouldsignify25percentagreement;
Acoefficientof0.8wouldindicate64percentagreement.
Itisimportanttonotethata‘levelofagreement’of50percentdoesnotmeanthat50percentofthestudentswouldeachhaveequivalentscoresonthetwoversions.Wearedealingwithanoverallmeasureofagreementthatdoesnotrefertotheindividualscoresofstudents.
Whatispredictivevalidity?
Predictivevalidityconcernsthedegreetowhichatestcanpredictcandidates’futureperformance.
e.g.
Howwellcouldaproficiencytestpredictastudent’sabilitytocopewithagraduatecourseataBritishuniversity?
Thechoiceofcriterionmeasureraisesinterestingissues:
Shouldwerelyonthesubjectiveanduntrainedjudgmentsofsupervisors?
HowhelpfulisittousefinaloutcomeasthecriterionmeasurewhensomanyfactorsotherthanabilityinEnglish(suchassubjectknowledge,intelligence,motivation,healthandhappiness)willhavecontributedtoeveryoutcome?
Whereoutcomeisusedasthecriterionmeasure,avaliditycoefficientofaround0.4(only20percentagreement)isaboutashighasonecanexpect.
Thisispartlybecauseoftheotherfactors,andpartlybecausethosestudentswhoseEnglishthetestpredictedwouldbeinadequatearenotnormallypermittedtotakethecourse,andsothetest’s(possible)accuracyinpredictingproblemsforthosestudentsgoesunrecognized.Asaresult,avaliditycoefficientofthisorderisgenerallyregardedassatisfactory.
e.g.
Tovalidateaplacementtest:
Placementtestsattempttopredictthemostappropriateclassforanyparticularstudent.Validationwouldinvolveanenquiry,oncecourseswereunderway,intotheproportionofstudentswhowerethoughttobemisplaced.Itwouldthenbeamatterofcomparingthenumberofmisplacements(andtheireffectonteachingandlearning)withthecostofdevelopingandadministeringatestwhichwouldplacestudentsmoreaccurately.
Whatcriterionmeasureshouldwechoose?
Shouldwechooseanassessmentofthestudent’sEnglishasperceivedbyhisorhersupervisorattheuniversity,ortheoutcomeofthecourse(pass/failetc.)?
Constructvalidity
Atest,partofatest,oratestingtechniqueissaidtohaveconstructvalidityifitcanbedemonstratedthatitmeasuresjusttheabilitywhichitissupposedtomeasure.
Theword‘construct’referstoanyunderlyingability(ortrait)whichishypothesizedinatheoryoflanguageability.
Onemighthypothesize,forexample,thattheabilitytoreadinvolvesanumberofsub-abilities,suchastheabilitytoguessthemeaningofunknownwordsfromthecontextinwhichtheyaremet.
Itwouldbeamatterofempiricalresearchtoestablishwhetherornotsuchadistinctabilityexistedandcouldbemeasured.Ifweattemptedtomeasurethatabilityinaparticulartest,
Itwouldbeamatterofempiricalresearchtoestablishwhetherornotsuchadistinctabilityexistedandcouldbemeasured.Ifweattemptedtomeasurethatabilityin