高尔顿发明回归Word格式.docx

上传人:b****3 文档编号:17878928 上传时间:2022-12-11 格式:DOCX 页数:12 大小:58.81KB
下载 相关 举报
高尔顿发明回归Word格式.docx_第1页
第1页 / 共12页
高尔顿发明回归Word格式.docx_第2页
第2页 / 共12页
高尔顿发明回归Word格式.docx_第3页
第3页 / 共12页
高尔顿发明回归Word格式.docx_第4页
第4页 / 共12页
高尔顿发明回归Word格式.docx_第5页
第5页 / 共12页
点击查看更多>>
下载资源
资源描述

高尔顿发明回归Word格式.docx

《高尔顿发明回归Word格式.docx》由会员分享,可在线阅读,更多相关《高尔顿发明回归Word格式.docx(12页珍藏版)》请在冰豆网上搜索。

高尔顿发明回归Word格式.docx

2001byJeffreyM.Stanton,allrightsreserved.

Thistextmaybefreelysharedamongindividuals,butitmaynotberepublishedinanymediumwithoutexpresswrittenconsentfromtheauthorandadvancenotificationoftheeditor.

KeyWords:

Correlation;

FrancisGalton;

Historyofstatistics;

KarlPearson.

Abstract

AnexaminationofpublicationsofSirFrancisGaltonandKarlPearsonrevealedthatGalton'

sworkoninheritedcharacteristicsofsweetpeasledtotheinitialconceptualizationoflinearregression.SubsequenteffortsbyGaltonandPearsonbroughtaboutthemoregeneraltechniquesofmultipleregressionandtheproduct-momentcorrelationcoefficient.Moderntextbookstypicallypresentandexplaincorrelationpriortointroducingpredictionproblemsandtheapplicationoflinearregression.ThispaperpresentsabriefhistoryofhowGaltonoriginallyderivedandappliedlinearregressiontoproblemsofheredity.Thishistoryillustratesadditionalapproachesinstructorscanusetointroducesimplelinearregressiontostudents.

1.Introduction

ThecompletenameofthecorrelationcoefficientdeceivesmanystudentsintoabeliefthatKarlPearsondevelopedthisstatisticalmeasurehimself.AlthoughPearsondiddeveloparigoroustreatmentofthemathematicsofthePearsonProductMomentCorrelation(PPMC),itwastheimaginationofSirFrancisGaltonthatoriginallyconceivedmodernnotionsofcorrelationandregression.Galton,acousinofCharlesDarwinandanaccomplished19thcenturyscientistinhisownright,hasoftenbeencriticizedinthiscenturyforhispromotionof"

eugenics"

(plannedbreedingofhumans;

see,forexample,Paul(1995).Historianshavealsosuggestedthathiscousin'

slastingfameunfairlyovershadowedthesubstantialscientificcontributionsGaltonmadetobiology,psychologyandappliedstatistics(see,forexample,FitzPatrick1960).Galton'

sfascinationwithgeneticsandheredityprovidedtheinitialinspirationthatledtoregressionandthePPMC.

ThethoughtsthatpromptedthedevelopmentofthePPMCbeganwithathenvexingproblemofheredity--understandinghowstronglythecharacteristicsofonegenerationoflivingthingsmanifestedinthefollowinggeneration.Galtoninitiallyapproachedthisproblembyexaminingcharacteristicsofthesweetpeaplant.Hechosethesweetpeabecausethatspeciescouldself-fertilize;

daughterplantsexpressgeneticvariationsfrommotherplantswithoutcontributionfromasecondparent.Thischaracteristiceliminated,oratleastpostponed,havingtodealwiththeproblemofstatisticallyassessinggeneticcontributionsfrommultiplesources.Galton'

sfirstinsightsaboutregressionsprangfromatwo-dimensionaldiagramplottingthesizesofdaughterpeasagainstthesizesofmotherpeas.Asdescribedbelow,Galtonusedthisrepresentationofhisdatatoillustratebasicfoundationsofwhatstatisticiansstillcallregression.Thegeneralizationoftheseeffortsintotheproduct-momentcorrelationandthemorecomplexmultipleregressioncamemuchlater.Currenttextbooksofbehavioralsciencestatisticstypicallyreversethisorder:

thePPMCispresentedfirstandlinearregressioniscoveredlater.Manyinstructorsmayalsofeelmorecomfortablestartingwithcorrelationandbuildinguptoregression.

Thepresentpaperprovideshistoricalbackgroundandillustrativeexamplesthatstatisticsinstructorsmayfindusefulinintroducingtheseconceptstocollegelevelclassesinappliedstatistics.Bybrieflytracingthehistoricaldevelopmentofregressionandcorrelation,thispapershowshowintroductorystatisticsinstructorscanuseengagingandhistoricallyaccurateexamplestointroduceregressionandcorrelationtostudents.Anumberofarticlesconcerningtheteachingofregressionandcorrelationindicatethatstudentsoftenhavedifficultyunderstandingtheseconceptsandtheconnectionbetweenthem(see,forexample,Williams1975;

Duke1978;

Karylowski1985;

GoldsteinandStrube1995;

).Thepresentarticleprovidesnewideasforinstructionbasedonthehistoricaloriginsofthesestatisticaltechniques.

2.Galton'

sEarlyConsiderationsofRegression

BesideshisroleasacolleagueofGalton'

sandaresearcherinGalton'

slaboratory,KarlPearsonalsobecameGalton'

sbiographerafterthelatter'

sdeathin1911(Pearson1922).Inhisfour-volumebiographyofGalton,Pearsondescribedthegenesisofthediscoveryoftheregressionslope(Pearson1930).1875年,Galton把7包甜豌豆(sweetpea)种籽分发给7位朋友,每包里的种子是重量一样的,但是包间重量差别很大(alsoseeGalton1894),。

朋友们种下这些种籽,又把收获的豆子寄还Galton(seeAppendixA)。

Galtonplottedtheweightsofthedaughterseedsagainsttheweightsofthemotherseeds.Galtonrealizedthatthemedianweightsofdaughterseedsfromaparticularsizeofmotherseedapproximatelydescribedastraightlinewithpositiveslopelessthan1.0:

"

Thushenaturallyreachedastraightregressionline,andtheconstantvariabilityforallarraysofonecharacterforagivencharacterofasecond.Itwas,perhaps,bestfortheprogressofthecorrelationalcalculusthatthissimplespecialcaseshouldbepromulgatedfirst;

itissoeasilygraspedbythebeginner."

(Pearson1930,p.5)

Thesimple,specialcasethatPearsonreferredtois,ofcourse,boththeroughlyequivalentvariabilityofthetwomeasuresandtheiridenticalunitsofmeasurement.Figure1usesasimple,inventeddatasettoillustrateGalton'

searliestfindings.TheparentsweetpeasizeontheX-axisandtheoffspringsweetpeasizeontheY-axishaveapproximatelyequalvariability.Thus,theslopeofthelineconnectingthemeansofthedifferentcolumnsofpointsisequivalentbothtotheregressionslopeandthecorrelationcoefficient.ForGalton'

spurposes,anyslopesmallerthan1.0indicatedregressiontothemeanforthatgenerationofpeas.Thephenomenonofregressiontothemeanisillustratedbytheconfigurationofpoints:

They-coordinatesofmostofthepointsinFigure1areclosertothehorizontaloffspringmeanthantheirx-coordinatesaretotheverticalparentmean.Galton'

sfirstdocumentedstudyofthistypesuggestedaslopeof0.33(obtainedthroughcarefulinspectionofhisscatterplots),whichindicatedtohimthatextremelylargeorsmallmotherseedstypicallygeneratedsubstantiallylessextremedaughterseeds.Thisfindingis,ofcourse,prototypicalofregressiontothemean:

Formanyvariables,naturalprocessesworkto"

dampen"

extremeoutliersandbringthemclosertotheirrespectivemeans.

Figure1.

Figure1.Connectingthemeansoftheindividualcolumnsofdataprovidesacrudeapproximationoftheregressionline.Theslopeisexactly0.50andthecorrelationisapproximatelyr=0.51.Many,thoughnotall,ofthepointsareclosertotheoffspringpeasizemeanof9ontheY-axisthantotheparentalpeasizemeanof10ontheX-axis.ThenumericdataappearinAppendixB.

Nonetheless,onlyahorizontallinewouldhaveindicatednoheritabilityinseedsizewhatsoever,soGalton'

sfindingaffirmedhisbasicassumptionsconcerningtheheritabilityof"

characters."

Figure1,greatlysimplifiedfromGalton'

soriginalgraph,illustrateshowalineconnectingthemeansofthecolumnsofdatapointsindicatesthedegreetowhichextremevaluesinthefirstgeneration(ontheX-axis)tendtoregresstowardthemeanofthesecondgeneration(ontheY-axis).Iinventedthedatapoints,whicharelistedinAppendixB,tosimplifyhandcalculationinaclassroomsetting.IncontrasttoFigure1,Galton'

soriginaldatadidnotproduceaperfectlysmoothline,buthewasabletodraw,byhand,asinglelinethatfitallthedatareasonablywell(Galton'

sfirstregressionlinewaspresentedatalecturein1877;

see(Pearson1930).Theslopeofthislinehedesignated"

r"

forregression.OnlyunderPearson'

slatertreatmentdidrcometostandforthecorrelationcoefficient(Pearson1896).

Galton'

sprogresswasbotheasedandhobbledbyhischoicesfordescriptivestatistics;

heusedthemedianasameasureofcentraltendencyandthesemi-interquartilerangeasameasureofvariability.Oneadvantageofthesemeasureslayinthesimplicityofobtainingthem.Galtonwasnearlyfanaticalaboutgraphingandtabulatingeveryavailabledatapoint.Thesedescriptivevaluescouldemergefromaninspectionoftheresultingfigureortablewithaminimumofcomputation.Itisunderstoodnowthatthemedianandsemi-interquartilerangedonothavethefavorablemathematicalpropertiesofthemeanandstandarddeviation(forexample,theycannotbemanipulatedusingcovariancealgebra).ButGaltonwasnotasophisticatedenoughmathematiciantorecognizethedeficiency.SoGalton'

sprogresstowardamoregeneralimplementationofregressionwasdelayedbyhischoiceofdescriptivestatistics.InNaturalInheritance(Galton1894),GaltonexpendedapageortwomakingvariousargumentsabouttheexactvalueoftheslopeofaregressionlineascalculatedwithvarioustechniquestoestimatethechangeinYversusthechangeinXonthescatterplot.Atthatpointintime,hiseffortslackedthemathematicalfoundationtoderivetheslopefromthedatathemselves.Asaninterestingfootnote,inthelate1870s,Galtondidnothaveaccesstoamechanicalcalculatingmachine,whereasPearsonhadoneforpersonaluseonhisdesknolaterthan1910(Pearson1938).

3.Galton'

sRecognitionoftheGeneralityofRegressionSlope

Evenwithhispoorchoiceofdescriptivestatistics,Galtonwasabletogeneralizehisworkoveravarietyofheredityproblems.Hetackledpersonalitytemperament,artisticability,anddis

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 考试认证 > IT认证

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1