完整word版语言测试学资料 2word文档良心出品.docx

资源描述

完整word版语言测试学资料 2word文档良心出品.docx

《完整word版语言测试学资料 2word文档良心出品.docx》由会员分享，可在线阅读，更多相关《完整word版语言测试学资料 2word文档良心出品.docx（8页珍藏版）》请在冰豆网上搜索。

完整word版语言测试学资料 2word文档良心出品.docx

完整word版语言测试学资料2word文档良心出品

Chapter2

（第二章）

TheValidityofLanguageTesting

（语言测试的效度）

Whatisvalidity?

Atestissaidtobevalidifitmeasuresaccuratelywhatitisintendedtomeasure.

Validityhasanumberofaspects:

•Contentvalidity内容效度

•Criterion-relatedvalidity标准相关效度

•Constructvalidity编制效度

•Facevalidity表面效度

•Theuseofvalidity效度的用途

ContentValidity:

Atestissaidtohavecontentvalidityifitscontentconstitutesarepresentativesampleofthelanguageskills,structures,etc.withwhichitismeanttobeconcerned.

e.g.

Agrammartestmustbemadeupofitemstestingknowledgeorcontrolofgrammar.

Butthisinitselfdoesnotensurecontentvalidity.Thetestwouldhavecontentvalidityonlyifitincludedapropersampleoftherelevantstructures.

Whataretherelevantstructureswilldependuponthepurposeofthetest.

Inordertojudgewhetherornotatesthascontentvalidity,weneedaspecificationoftheskillsorstructuresetc.thatitismeanttocover.

Suchaspecificationshouldbemadeataveryearlystageintestconstruction.

Itisn’ttobeexpectedthateverythinginthespecificationwillalwaysappearinthetest;theremaysimplybetoomanythingsforallofthemtoappearinssingletest.

Butitwillprovidethetestconstructorwiththebasisformakingaprincipledselectionofelementsforinclusioninthetest.

Acomparisonoftestspecificationandtestcontentisthebasisforjudgmentsastocontentvalidity.Ideallythesejudgmentsshouldbemadebypeoplewhoarefamiliarwithlanguageteachingandtestingbutwhoarenotdirectlyconcernedwiththeproductionofthetestinquestion.

Whatistheimportanceofcontentvalidity?

First,thegreateratest’scontentvalidity,themorelikelyitistobeanaccuratemeasureofwhatitissupposedtomeasure.

Atestinwhichmajorareasidentifiedinthespecificationareunder-represented-----ornotrepresentedatall-----isunlikelytobeaccurate.

Secondly,suchatestislikelytohaveaharmfulbackwasheffect.Areaswhicharenottestedarelikelytobecomeareasignoredinteachingandlearning.Toooftenthecontentoftestsisdeterminedbywhatiseasytotestratherthanwhatisimportanttotest.

Thetestsafeguardagainstthisistowritefulltestspecificationsandtoensurethatthetestcontentisafairreflectionofthese.

Discussion

Case1

Doyouthinkanachievementtestforintermediatelearnerstocontainjustthesamesetofstructuresasoneforadvancedlearnershascontentvalidity?

No.

Case2

About20yearsago,thecandidatesofuniversityentranceexaminationinAmericawasgivenacompositiontopic:

Isphotographyanartorscience?

Discuss.

Doyouthinkthistesthasvalidity?

No.

Case3

Theintentionofotherpeopleconcerned,suchastheMinisterofDefense,toinfluencethegovernmentleaderstoadapttheirpolicytofitinwiththedemandsoftherightwing,cannotbeignored.

Whatisthesubjectof“cannotbeignored”?

A.theintention

B.otherpeopleconcerned

C.theMinisterofDefense

D.thedemandsoftherightwing

Whatdoesthisitemwanttomeasure,readingcomprehensionorsentencestructure?

Criterion-relatedvalidity:

Anotherapproachtotestvalidityistoseehowfarresultsonthetestagreewiththoseprovidedbysomeindependentandhighlydependableassessmentofthecandidate’sability.Thisindependentassessmentisthusthecriterionmeasureagainstwhichthetestisvalidated.

Thereareessentiallytwokindsofcriterion-relatedvalidity:

concurrentvalidity（共时效度）andpredictivevalidity（预时效度）.

Whatisconcurrentvalidity?

Concurrentvalidityisestablishedwhenthetestandthecriterionareadministeredataboutthesametime.

e.g.

Courseobjectivescallforanoralcomponentaspartofthefinalachievementtest.

Theobjectivesmaylistalargenumberof‘functions’whichstudentsareexpectedtoperformorally,totestallofwhichmighttake45minutesforeachstudent.Thiscouldwellbeimpractical.

Perhapsitisfeltthatonlytenminutescanbedevotedtoeachstudentfortheoralcomponent.

Thequestionthenarises:

Cansuchaten-minutesessiongiveasufficientlyaccurateestimateofthestudentsabilitywithrespecttothefunctionsspecifiedinthecourseobjectives?

Isitavalidmeasure?

Fromthepointofviewofcontentvalidity,thiswilldependonhowmanyofthefunctionsaretestedinthecomponent,andhowrepresentativetheyareofthecompletesetoffunctionsincludedintheobjectives.

Everyeffortshouldbemadewhendesigningtheoralcomponenttogiveitcontentvalidity.Oncethishasbeendone,however,wecangofurther.Wecanattempttoestablishtheconcurrentvalidityofthecomponent.

Howtodoit?

Weshouldchooseatrandomasampleofallthestudentstakingthetest.

Thesestudentswouldthenbesubjectedtothefull45minuteoralcomponentnecessaryforcoverageofallthefunctions,usingperhapsfourscorerstoensurereliablescoring.

Thiswouldbethecriteriontestagainstwhichtheshortertestwouldbejudged.

Thestudents’scoresonthefulltestwouldbecomparedwiththeonestheyobtainedontheten-minutesession,whichwouldhavebeenconductedandscoredintheusualway,withoutknowledgeoftheirperformanceonthelongerversion.

Ifthecomparisonbetweenthetwosetsofscoresrevealsahighlevelofagreement,thentheshorterversionoforalcomponentmaybeconsideredvalid,inasmuchasitgivesresultssimilartothoseobtainedwiththelongerversion.

If,ontheotherhand,thetwosetsofscoresshowlittleagreement,theshorterversioncannotbeconsideredvalid;itcannotbeusedasadependablemeasureofachievementwithrespecttothefunctionsspecifiedintheobjectives.

Ofcourse,iftenminutesreallyisallthatcanbesparedforeachstudent,thentheoralcomponentmaybeincludedforthecontributionthatitmakestotheassessmentofstudents’overallachievementandforitsbackwasheffect.Butitcannotberegardedasanaccuratemeasureinitself.

‘ahighlevelofagreement’

‘littleagreement’

Howisthelevelofagreementmeasured?

Standardproceduresforcomparingsetsofscores:

‘validitycoefficient’

amathematicalmeasureofsimilarity

Perfectagreementbetweentwosetsofscoreswillresultinavaliditycoefficientof1.

Totallackofagreementwillgiveacoefficientofzero.

Itisbesttosquarethatcoefficient.

acoefficientof0.7betweenthetwooraltests

Squared

0.49

convertedtoapercentage,

49percent

Onthebasisofthis,wecansaythatthescoresontheshorttestpredict49percentofthevariationinscoresonthelongertest.

Inbroadterms,thereisalmost50percentagreementbetweenonesetofscoresandtheother.

Acoefficientof0.5wouldsignify25percentagreement;

Acoefficientof0.8wouldindicate64percentagreement.

Itisimportanttonotethata‘levelofagreement’of50percentdoesnotmeanthat50percentofthestudentswouldeachhaveequivalentscoresonthetwoversions.Wearedealingwithanoverallmeasureofagreementthatdoesnotrefertotheindividualscoresofstudents.

Whatispredictivevalidity?

Predictivevalidityconcernsthedegreetowhichatestcanpredictcandidates’futureperformance.

e.g.

Howwellcouldaproficiencytestpredictastudent’sabilitytocopewithagraduatecourseataBritishuniversity?

Thechoiceofcriterionmeasureraisesinterestingissues:

Shouldwerelyonthesubjectiveanduntrainedjudgmentsofsupervisors?

HowhelpfulisittousefinaloutcomeasthecriterionmeasurewhensomanyfactorsotherthanabilityinEnglish（suchassubjectknowledge,intelligence,motivation,healthandhappiness）willhavecontributedtoeveryoutcome?

Whereoutcomeisusedasthecriterionmeasure,avaliditycoefficientofaround0.4（only20percentagreement）isaboutashighasonecanexpect.

Thisispartlybecauseoftheotherfactors,andpartlybecausethosestudentswhoseEnglishthetestpredictedwouldbeinadequatearenotnormallypermittedtotakethecourse,andsothetest’s（possible）accuracyinpredictingproblemsforthosestudentsgoesunrecognized.Asaresult,avaliditycoefficientofthisorderisgenerallyregardedassatisfactory.

e.g.

Tovalidateaplacementtest:

Placementtestsattempttopredictthemostappropriateclassforanyparticularstudent.Validationwouldinvolveanenquiry,oncecourseswereunderway,intotheproportionofstudentswhowerethoughttobemisplaced.Itwouldthenbeamatterofcomparingthenumberofmisplacements（andtheireffectonteachingandlearning）withthecostofdevelopingandadministeringatestwhichwouldplacestudentsmoreaccurately.

Whatcriterionmeasureshouldwechoose?

Shouldwechooseanassessmentofthestudent’sEnglishasperceivedbyhisorhersupervisorattheuniversity,ortheoutcomeofthecourse（pass/failetc.）?

Constructvalidity

Atest,partofatest,oratestingtechniqueissaidtohaveconstructvalidityifitcanbedemonstratedthatitmeasuresjusttheabilitywhichitissupposedtomeasure.

Theword‘construct’referstoanyunderlyingability（ortrait）whichishypothesizedinatheoryoflanguageability.

Onemighthypothesize,forexample,thattheabilitytoreadinvolvesanumberofsub-abilities,suchastheabilitytoguessthemeaningofunknownwordsfromthecontextinwhichtheyaremet.

Itwouldbeamatterofempiricalresearchtoestablishwhetherornotsuchadistinctabilityexistedandcouldbemeasured.Ifweattemptedtomeasurethatabilityinaparticulartest,

Itwouldbeamatterofempiricalresearchtoestablishwhetherornotsuchadistinctabilityexistedandcouldbemeasured.Ifweattemptedtomeasurethatabilityin

展开阅读全文