lecture6.docx

上传人:b****3 文档编号:4977419 上传时间:2022-12-12 格式:DOCX 页数:18 大小:205.96KB
下载 相关 举报
lecture6.docx_第1页
第1页 / 共18页
lecture6.docx_第2页
第2页 / 共18页
lecture6.docx_第3页
第3页 / 共18页
lecture6.docx_第4页
第4页 / 共18页
lecture6.docx_第5页
第5页 / 共18页
点击查看更多>>
下载资源
资源描述

lecture6.docx

《lecture6.docx》由会员分享,可在线阅读,更多相关《lecture6.docx(18页珍藏版)》请在冰豆网上搜索。

lecture6.docx

lecture6

Chapter3DiagnosticandRemedialMeasuresSet1

Itisimportanttoexaminetheaptnessofthemodelforthedatabeforeinferencebasedonthatmodelareundertaken.Inthischapter,wediscusssomesimplegraphicmethodsforstudyingtheappropriatenessofamodel,aswellassomestatisticaltestsfordoingso.Wealsoconsidersomeremedialtechniqueswhenmodel(2.1)isnotappropriateforthedata.

Lecture6Residuals

Departurefrommodel(2.1)(thesimplelinearregressionmodelwithnormalerror)tobestudiedbyresiduals

1.Theregressionfunctionisnotlinear.

2.Theerrortermsdonothaveconstantvariance.

3.Theerrortermsarenotindependent.

4.Themodelfitsallbutoneorafewoutlierobservations.

5.Theerrortermsarenotnormallydistributed.

6.Oneorseveralimportantpredictorvariableshavebeenomittedfromthemodel.

Residuals

Supposethedataset

hasbeenusedtofittheleastsquaresregressionline

.

Theithresidualis

.

Lemma1.Undermodel(2.1),for

theresidual

followsnormaldistributionthat

and

.

Note:

When

islargeandX’sarewellspaced,

and

.

Aswealreadyknow,

and

.

Remarks:

Theresiduals

areinasensetheestimatorsoftheunobservedrandomerrorterms

.

StudentizedandSemi-studentizedResiduals

Let

denotetheestimatedstandarddeviation(i.e.,standarderror)oftheithresidual.ThentheithStudentizedresidualisgivenby

.

SuchmodifiedresidualsaresaidtobeStudentizedsincetheyareobtainedfromtheithresidualbysubtractingthemean

anddividingbyitsstandarderror.Thisproceduremimicstheprocedureusedtocomputetheteststatisticfortestingahypothesisaboutthemeanofanormalpopulationwhenthepopulationvarianceisunknown.Suchateststatistichasastudentt-distribution(withdegreesoffreedomn-1).ThemodifiedresidualdefinedabovedosenotactuallyhasaStudenttdistribution,(

andSSEarenotindependent),butitisobtainedbythesametypeoftransformationusedtoconstructrandomvariableshavingtheStudenttdistribution.Thusthename,Studentized.

Remarks

1.Onpage103ofyourtextbook,theauthordiscussessemi-studentizedresiduals.Theithsemi-studentizedresidualissimply

.

When

islargeandtheX’sarewellspaced,

sothat

.Thesemi-studentizedresidualsarecertainlyeasiertocompute(assumingthatyouhadtomakethecomputationyourself),butSASwillcomputetheactualStudentizedresidualsuponrequest,sowhynotgowiththe“realthing”.

2.SAScanalsocomputewhatSAScallsRstudentizedresiduals.ThedifferencebetweenanordinarystudentizedresidualandRstudentizedresidualisthatfortherstudentizedresidual

isreplacedwith

where

.

Rstudentizedresidualswereproposedbybelsley,KuhandWelschintheirbook,RegressionDiagnostics(weiley1980).Inthecontestofsimpleregression,theyclaimthatforeachi,rstudenthasapproximatelyaStudenttdistributionwithn-3degreesoffreedom.Rstudentresidualsaregoodfordetectingoutlierssinceanobservationwithalargeresidualtendsinflatethe

.Deletingthisobservationincomputingthe

meansthat

.

However,theauthorofyourtextdonotdiscussRstudentizedresiduals(theycallthemdeletedstudentnizedresiduals)untilChapter9or10,sotokeepthingssimple,wewillwaittillthentodiscussthistypeofresidual.

ResidualPlots

Insimplelinearregression(onepredictorvariable),residualsareusuallyplottedagainsttheircorresponding

valueoragainsttheircorrespondingpredictedvalue

.(Inmultipleregression,wheretherearemanydifferent

’s,residualsareusuallyplottedagainst

.)Residualsareneverplottedagainstthecorrespondingactual

becausethesetermshaveapositivecovariance,whichwouldappearasapositivetrendintheplot.Ontheotherhand

and

areuncorrelated.

.

So,

and

.

Iftheassumptionsofmodel(2.1)arecorrectforthedata,theresiduals(plottedontheordinate)againsteither

or

(ontheabscissa)shouldberandomlydistributedabutthehorizontalaxis.

Example:

InthevehicleweightversusMPGexample,theresidualsandthestudentizedresidualsaregivenintheoutputofthefollowingSAScode:

PROCREGDATA=Cars;

MODELmpg=weight/R;

OUTPUTout=CarsoutP=PredMPGR=ResidualStudent=Stud_ResRstudent=Rstud_Res;

RUN;

goptionsreset=globalgunit=pctborder

ftext=swissbhtitle=6htext=3

hsize=8invsize=5incback=white;

/*graphinhsymbols,theirinterpolationsandcolors*/

symbol1v=circleh=3c=red;

symbol2v=squareh=4c=green;

symbol3v=diamondh=3c=red;

run;

titlecolor=blue'Stud_res,Rstud_resandresidualv.s.wgt';

PROCGPLOT;

PLOTResidual*weight=2Stud_Res*weight=1Rstud_Res*weight=3/legendoverlayVref=0;

RUN;

titlecolor=blue'Stud_resv.s.PredMPG';

PROCGPLOT;

PLOTStud_Res*Predmpg=3/Vref=0;

RUN;

 

OutputStatistics

DependentPredictedStdErrorStdErrorStudentCook's

ObsVariableValueMeanPredictResidualResidualResidual-2-1012D

118.300018.09290.16960.20710.3100.668||*|0.067

215.900016.30450.1381-0.40450.325-1.243|**||0.139

316.400016.50320.1259-0.10320.330-0.313|||0.007

417.500017.29810.11710.20190.3340.605||*|0.023

515.500015.50970.2067-0.0096770.287-0.0337|||0.000

618.800018.49030.20670.30970.2871.080||**|0.303

716.800016.50320.12590.29680.3300.899||*|0.059

816.500016.90060.1124-0.40060.335-1.195|**||0.080

916.500016.10580.15290.39420.3191.237||**|0.176

1017.800018.29160.1876-0.49160.300-1.641|***||0.528

SumofResiduals0

SumofSquaredResiduals0.99961

PredictedResidualSS(PRESS)1.60514

 

NonlinearityoftheRegressionFunction

Iftheplotoftheresiduals(orthestudentizedresiduals)againstthepredictorvariable(orthepredictedresponsevariable,

)isnotrandomlydistributedaboutthehorizontalaxis,itcouldbeanindicationthattheregressionfunctionisnonlinear.Nonlinearityoftheregressionfunctioncanalsobeascertainedfromthescatterplot,butthescatterplotisnotalwaysaseffectiveasaresidualplot.SeeFigure3.3onpage105.AlsoseeFigure3.4(b)onpage106.

Note:

The(studentized)residualplotsinourvehicleweightvsMPGexampleshowamoreorlessrandomdispersionaboutthehorizontalaxis.Thusthelinearmodelinthisinstanceappearstobeadequate.Thisconclusionissupportedbytherelativelysmallrootmeansquareof0.35348andrelativelyhigh

.

NonconstancyofErrorVariance

Aplotoftheresiduals(orthestudentizedresiduals)againstthepredictorvariable

orthepredictedresponsevariable

arealsousefulinaccessingwhetherofnottheerrorvarianceisconstantasassumedinthemodel.Ifthemagnitudeoftheresidualstendstoincreaseor(lesslikely)todecreaseas

increases,thisisindicativeofasituationinwhichtheerrorvarianceischangingasthevalueoftheindependentvariable

changes.Since

islinearlyrelatedtopredictorvariable

asimilarstatementcanbemaderegardingaplotoftheresiduals(orstudentizedresiduals)against

.Systematicchangesinthemagnitudeoftheresidualsviolatetheassumptionthattheerrorshaveconstantvariance.A“wedgeshaped”residualplotasinFigure3.4(c),page106,wouldtypifythissituation.Alesslikely,butpossiblesituationisiftheerrorvarianceisdecreasingas

isincreases.Thissituationwouldresultina“reversedwedge”plot.

Note:

thestudentizedresidualplotsintheweight-MPGexampledonotexhibitanywedge-shapedpattern,whichindicatesthatthevarianceismoreorlessconstant.However,theresidualplotinFigure3.5,page107,showsatendencyforthevariabilityoftheresidualstoincreasewith

.

Outliers

Outliersareextremeobservations.Theycanbeidentifiedbyresidualplotsagainsteither

or

.Studentizedplotsareparticularlyhelpfulinthiscontext.Aroughruleofthumb(whennislarge)istoconsideranobservation

whosestudentizedresidual

tobeanoutlier.Actuallythisruleisratherconservative.Amoreaggressiveruleistodeclareobservation

tobeanoutlierif

andsomestatisticsrecommendedusing2.5.MorerefinedproceduresforidentifyingoutlierswillbediscussedinChapter10.

Thebigquestionis“Shouldoutliers,onceidentified,bediscarded?

”Itisalwaystemptingtodiscardoutlierssincetheytendtodestroytheleastsquarefit,particularlyinsmalltomoderatesamples.So,theresidualplotsmayimproperlysuggestalackoffitofthelinearregressionmodel,inadditiontoflaggingtheoutlier.Figure3.7,page109,clearlyillustratesthissituation.

However,inmoststatisticians’opinions,outliershouldbediscardedifandonlyif

1.Theobservationcausingtheoutlierinvolvesandatainputerror,or

2.Theobservationcausingtheoutlierinvolvesanextraneouscase.

Byanextraneouscasewemeanthattheoutlyingobservationwascollectedunderconditionssubstantiallydifferentfromthatoftheotherobservations.Unfortunately,thestatisticianmaynotalwaysbeabletoascertainwhetherornotsituation1and/or2pertain.

Theautomaticdiscardingofoutlierscanresultinoverfittingthelinearmodeltotheremainingdatapoints.Furthermore,outliersmayconveysignificantinformation,suchaswhentheoutlieristheresultofinteractionofsomeotherpredictorvariable,whichisnotincludedinthemodel.

Note:

Figure3.6,page108,showsaresidualplotwithanoutlier.Therearenooutliersinourvehicledata.Refertotheresidualandstudentizedresidualplots.

Nonindependenceoftheerrorterms

Althoughtheactualerrortermsareassumedtobe

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 经管营销 > 经济市场

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1