Regrdiag.docx

上传人:b****7 文档编号:8868963 上传时间:2023-02-02 格式:DOCX 页数:9 大小:22.06KB
下载 相关 举报
Regrdiag.docx_第1页
第1页 / 共9页
Regrdiag.docx_第2页
第2页 / 共9页
Regrdiag.docx_第3页
第3页 / 共9页
Regrdiag.docx_第4页
第4页 / 共9页
Regrdiag.docx_第5页
第5页 / 共9页
点击查看更多>>
下载资源
资源描述

Regrdiag.docx

《Regrdiag.docx》由会员分享,可在线阅读,更多相关《Regrdiag.docx(9页珍藏版)》请在冰豆网上搜索。

Regrdiag.docx

Regrdiag

RegressionDiagnostics

 

Thestrengthofaregressionmodel,howeverpowerfulitmayappearonacomputerscreen,restsuponitsassumptionsbeingreasonablyvalidinpractice.Theleastsquaresregressionmodelisapowerfulresearchtoolbutwhenappliedinappropriatelyitcaneasilyleadtononsensicalresultsbeingproduced.Thisdangerisparticularlyacutenowthatwehaveeasyaccesstopowerfulcomputerswhichallowustorunamultitudeofregressionsatvirtuallynocostatall.Unfortunately,therecanbeatendencyfortheresearchertoarrivethroughtrialanderrorataregressionwhichlooksgooddespitepossiblefragilefoundations.Gooddataanalysis,however,requiresthatwetakemodellingandtheassumptionuponwhichitrestsseriously.Withthewideavailabilityofcomputergraphics,itisnowpossibletoexplorethepropertiesofthemodelsrathermore.Regressiondiagnosticsallowsustodeterminewhetherthereisanythingstrangeaboutanyoftheobservations.Datawhichisinsomewaystrangecanariseinseveralways:

1.Theremaybegrosserrorsineithertheresponseortheexplanatoryvariables.

2.Thelinearmodelmaybeinadequatetodescribethesystematicstructureofthedata.

3.Itmaybemoreappropriatetoanalysethedatainanotherscale,e.g.logarithmic.

4.Itmaybethattheerrordistributionoftheresponsevariableisnotnormal.

Thekindofquestionswewanttoinvestigatesoastocheckwhetherthemodelassumptionsarereasonablyvalidinpracticeareasfollows:

1.DoestherelationbetweenYandtheX’sfollowalinearpattern?

2.Aretheresidualsapproximatelynormallydistributed?

3.Aretheresidualsreasonablyhomoscedastic?

4.Aretheresidualsautocorrelated?

5.Doalldatapointscontributeroughlyequallytodeterminethepositionofthelineordosomepointsexertundueinfluenceontheoutcome?

6.Arethereanyoutlyingdatapointswhichclearlydonotfitthegeneralpattern?

Anumberofplotscanbepreferred,butithastoberememberedthattheyarenotalwayssufficientlypowerfulandtheycanbemisleading.Suchplotsareasfollows:

1.Scatterplotofyvxi

Usewithcare,butmaysuggestnon-linearity.

2.Residuals/StandardisedResidualsvxi

Thepresenceofacurvilinearrelationshipsuggeststhatahigher-orderterm,e.g.quadratic,shouldbeaddedtothemodel,oratransformation,suchasalog,shouldbeconsidered.Canindicatetheexistenceofoutliers,structuralbreaksandnon-constantvarianceoftheerrorterm,i.e.heteroscedasticity

3.Residualsvexplanatoryvariablesnotinthemodel.

Thepresenceofarelationshipwouldsuggestthattheexplanatoryvariableshouldbeincludedinthemodel.

4.Residualsvy

Ifthevarianceoftheresidualschangeswiththepredictedvalues,thenheteroscedasticityisindicated.Outliers,non-linearityandstructuralbreaksmayalsobeindicated.

5.Residualsvtime

Inthecaseoftimeseriesdata,correlationbetweentheerrortermscanbedetectedsuggestingthepresenceofautocorrelation.Thismayindicatemissingvariable(s)inthemodel.

6.Variablesvtime

Aproblemassociatedwithnon-stationaryvariables,andfrequentlyfacedbyeconometricianswhendealingwithtimeseriesdata,isthespuriousregressionproblem.Ifatleastoneoftheexplanatoryvariablesinaregressionequationisnon-stationaryinthesensethatitdisplaysadistinctstochastictrend,itisverylikelythecasethatthedependentvariableintheequationwilldisplayasimilartrend.Ifsuchaproblemisdetected,thenerrorcorrectionmodels(ECM)andcointegrationanalysiswillhavetobeconsidered.

7.Normalplotoftheresiduals

Theuseofnormalityplotscanhelpdetectabnormalitieswiththedataandthemodel.Ifthemodeliscorrectlyspecified,thentheresidualsshouldlooklikeasamplefromanormaldistribution.

Note:

Anysystematicpatternintheresidualsofaregressionequationshouldberegardedassuggestiveofthepossibilityofmisspecification.

 

NormalityTests

Recallthatoneoftheassumptionsintheclassicalregressionmodelisthattheerrorshadtobenormallydistributedabouttheirzeromean.Theassumptionisnecessaryiftheinferentialaspectsofclassicalregression(ttest,Ftestsetc.)aretobevalidinsmallsamples.

Thereareseveraltestsofnormalitythatcanbeused.

1.HistogramofResiduals

Asimplegraphicaldevice,butrathersubjective.

2.NormalProbabilityPlot

Arathercomparativelysimplegraphicaldevice.MINITABwillproducenormalscoresbymeansoftheNSCOREScommand.ThesecanbeplottedagainsttheresidualsandanelongatedS-shapedcurveshouldbeproducediftheresidualsarenormallydistributed.Thereshouldalsobeanextremelyhighcorrelationbetweentheresidualsandthenscores–statisticaltableswillberequiredtocheckthesignificanceoftheresults.

3.NormalProbabilityTestsproducedbyMINITAB

Anderson-Darlingnormalitytest

Ryan-Joinernormalitytest

Kolmogorov-Smirnoffnormalitytest

Thesetestsareconstructedusingdifferentassumptionsaboutthedata(fordetailsseetheHELPfacilitywithinMINITAB).Theyalltakethenullhypothesistobeoneofnormality.Therefore,normalityoftheresidualswillberejectedifthequotedp-valueissmallerthanthesignificancelevel,.

4.TheJarque-BeraTestforNormality

AtestofnormalitywhichisfoundinanumberofeconometricpackagesintheJarque-Bera(JB)test.ThisisanasymptoticorlargesampletestandisbasedonOLSresiduals.Thetesthingesonthevaluesforskewnessandkurtosiswhichforanormaldistributionare0and3respectively.ThesearemeasuredbythethirdandfourthmomentsoftheOLSresiduals.

Underanullhypothesisofnormallydistributeddisturbances,

wehaveskewness(3)=0andkurtosis(4)=3

Itcanbeshownthat

Z3=3nandZ4=(4-3)n

624

bothhaveastandardnormaldistributioninlargesamples.

Therefore,

(Z32+Z42)willhavea2with2df

Hence,

Ho:

residualsarenormallydistributed

i.e.3=0and4=3

v

H1:

residualsarenotnormallydistributed

i.e.30or43orboth

WerejectH0if

JB=Z32+Z42

=(n/6).32+(n/24).(4-3)2

=n[32/6+(4-3)2/24]

>

(2)2at%significancelevel

Wherenisthesamplesize,3isskewnessand4iskurtosis

Now,fromthesampleofresiduals

2=ei2/n,3=ei3/n,4=ei4/n

Itcanbeshownthat

32=32/23and4=4/22

Outliers,LeverageandInfluence

Inregressionanalysis,youshouldalwaysbewareofpointswhichdonotfitthegeneralpatternorexertunderinfluenceontheoutcomeofournumericalsummaries.Therearethreetypesofdatapointswhichshouldconcernus.Theseare:

anoutlier,apointofhighleverage,andaninfluentialpoint.

Anoutlierinaregressionisadatapointwhichhasalargeresidual(usuallymorethanthreestandarddeviationsfromthemean(=0)).

Apointofhighleveragecanbedefinedthus:

‘AdatapointhasahighleverageifitisextremeintheX-direction,i.e.itisadisproportionatedistanceawayfromthemiddlerangeoftheX–values’.ThesepointscanexertundueinfluenceontheoutcomeofanOLSregressionline.Theyarecapableofexertingastrongpullontheslopeoftheregressionline.Whethertheydosoornotisanothermatter.

Aninfluentialpointisapointwhichifremovedfromthesamplewouldmarkedlychangethepositionoftheleastsquaresregressionline.Hence,influentialdatapointspulltheregressionlineintheirdirection.Notethatinfluentialdatapointsdonotnecessarilyproducelargeresiduals,thatis,theyarenotalwaysoutliersaswell,althoughtheycanbe.Itispreciselybecausetheydrawtheregressionlinetowardsthemselvesthattheymayendupwithsmallresiduals.Conversely,anoutlierisnotnecessarilyaninfluentialpoint,particularlywhenitisapointwithlittleleverage.

Ingeneralwenote:

outliersarenotnecessarilyinfluential

buttheycanbeso(dependingonleverage)

yethighleveragepointsarenotalwaysinfluential

andinfluentialpointsarenotnecessarilyoutliers.

Thepresenceofoutliersorofinfluentialpointsoftengivesusaclearsignalthatourmodelisprobablymisspecified.Intermsofvisualdisplays,outlierscanbespottedwithresidualplots,whereasinfluentialpointsreallyneedscatterplotswhicharenotalwayssomeaningfulwhendealingwithseveralexplanatoryvariables.Apartfromthesegraphicalmethods,wecanalsousesomespecialstatisticsdesignedtodetectoutliers,pointsofleverageandpointsofinfluence.

StudentisedResiduals

Agoodwaytodetectoutliersistoinvestigateeachobservationatatime,usinganOLSregressionwiththerelevantobservationexcluded,andtestingwhetherthepredictionerrorforthatobservationissignificantlylarger.Thiscanbemosteasilydonebyincludinganobservation-specificdummyvariable.Forexample,toinvestigatetheithobservationinadataset,wedefineadummyvariabletakingavalueofunityfortheithobservationandzeroforallotherobservations.IfweincludethisdummyintheOLSregression,itscoefficientwillequaltherequiredpredictionerror.Totestthepredictionerrorforsignificance,wecanexamineitstratio.Thistratioisreferredtoasastudentisedresidual.Ithasastudent’stdistributionwith(T-1-K-1)degreesoffreedomwhereKisthenumberofexplanatoryvariables.

Wedefinethestudentisedresiduals(ei*).

ei*=ei/[s(i)(1-hi)]=sei/s(i)

Wheres(i)isthestandarderrorestimateoftheregressionfittedafterdeletingtheithobservation,andhiisameasureofleverage,andei’isthestandardizedresidual.

UnusualYvalueswillclearlystando

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 自然科学 > 数学

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1