Clinical Prediction Models A Practical Approach to Development Validation and Updating by Ewout W. Steye.pdf
《Clinical Prediction Models A Practical Approach to Development Validation and Updating by Ewout W. Steye.pdf》由会员分享,可在线阅读,更多相关《Clinical Prediction Models A Practical Approach to Development Validation and Updating by Ewout W. Steye.pdf(574页珍藏版)》请在冰豆网上搜索。
StatisticsforBiologyandHealthEwoutW.SteyerbergClinicalPredictionModelsAPracticalApproachtoDevelopment,Validation,andUpdatingSecondEditionStatisticsforBiologyandHealthSeriesEditorsMitchellGail,DivisionofCancerEpidemiologyandGenetics,NationalCancerInstitute,Rockville,MD,USAJonathanM.Samet,DepartmentofEpidemiology,SchoolofPublicHealth,JohnsHopkinsUniversity,Baltimore,MD,USAB.Singer,DepartmentofStatistics,UniversityofCaliforniaatBerkeley,Berkeley,CA,USAMoreinformationaboutthisseriesathttp:
/W.SteyerbergClinicalPredictionModelsAPracticalApproachtoDevelopment,Validation,andUpdatingSecondEdition123EwoutW.SteyerbergDepartmentofBiomedicalDataSciencesLeidenUniversityMedicalCenterLeiden,TheNetherlandsISSN1431-8776ISSN2197-5671(electronic)StatisticsforBiologyandHealthISBN978-3-030-16398-3ISBN978-3-030-16399-0(eBook)https:
/doi.org/10.1007/978-3-030-16399-01stedition:
SpringerScience+BusinessMedia,LLC20092ndedition:
SpringerNatureSwitzerlandAG2019Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynowknownorhereafterdeveloped.Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAGTheregisteredcompanyaddressis:
Gewerbestrasse11,6330Cham,SwitzerlandForAleida,Matthijs,LaurensandSuzanneFormyfatherWimPrefaceThefirsteditionofthisbookwasmadeduringtheyears20052007.Sincethenquitesomenewdevelopmentshavetakenplace,bothinthegeneralscientificdirectionthatpredictionresearchistakingandspecifictechnicalinnovations.Thesedevelopmentshavebeenaddressedasfaraspossibleinthesecondedition.Manynewreferenceshavebeenadded.Somedetailedmaterialhasbeenmovedfromprinttotheweb.Manyfigureshavebeenredrawnincolorforbetterclarityandattrac-tiveness.Inall,manychangeshavebeenmadetonearlyeverychapter.Predictionmodelsareimportantinwidelydiversefields,includingmedicine,physics,engineering,meteorology,andfinance.Predictionmodelsarebecomingmorerelevantinthemedicalfieldwiththeincreaseinbiologicalknowledgeonpotentialpredictorsofoutcome,e.g.,from“omics”(includinggenomics,tran-scriptomics,proteomics,glycomics,metabolomics).Also,theBigDataeraimplieswewillhaveincreasingaccesstolargevolumesofroutinelycollecteddata.Thenumberofapplicationsforpredictionmodelswillincrease,e.g.,withtargetedearlydetectionofdisease,andindividualizedapproachestodiagnostictestingandtreatment.Wearemovingtoaneraofpersonalizedevidence-basedmedicinethatasksforanindividualizedapproachtosharedmedicaldecision-making.Evidence-basedmedicinehasacentralplaceformeta-analysistosummarizeresultsfromran-domizedcontrolledtrials;predictionmodelssummarizetheeffectsofpredictorstoprovideindividualizedpredictionsoftheabsoluteriskofadiagnosticorprognosticoutcome.Predictionmodelsandrelatedalgorithmswillincreasinglyformthebasisforpersonalizedevidence-basedmedicineandindividualizeddecision-making.WhyReadThisBook?
Mymotivationforworkingonthefirstandsecondeditionsofthisbookstemsprimarilyfromthefactthatthedevelopmentandapplicationsofpredictionmodelsareoftensuboptimalinmedicalpublications.Withthisbook,Ihopetocontributetoviibetterunderstandingofrelevantissuesandgivepracticaladviceonbettermodelingstrategiesthanarenowadaysused.Issuesincludethefollowing:
(a)Betterpredictivemodelingissometimesreadilypossible,e.g.,alargedatasetwithhigh-qualitydataisavailable,butallcontinuouspredictorsaredichot-omized,whichisknowntohaveseveraldisadvantages.(b)Smallsamplesareused:
Studiesareunderpowered,implyingunreliableanswerstodifficultquestionssuchas“Whicharethemostimportantpredictorsinthispredictionproblem?
”Theproblemofsmallsamplesizeisaggravatedbydoingacompletecaseanalysiswhichdiscardsinformationfromnearlycompleterecords.Statisticalimputationmethodsarenowadaysavailabletoexploitallavail-ableinformation,especially“multipleimputations.”Predictorsareomittedthatshouldreasonablyhavebeenincludedbasedonsubjectmatterknowledge.Analystsrelytoomuchonthelimiteddatathattheyhaveavailableintheirdataset,insteadofwiselycombininginformationfromseveralsources,suchasmedicalliteratureandexpertsinthefield.Stepwiseselectionmethodsareabundantwhenresearchersapplyregressionmodeling,whilethesemethodsaresuboptimal,especiallyinsmalldatasets.Modelingapproachesareusedthatrequirehighernumbers.Data-hungrytechniques,suchasneuralnetworkmodeling,machinelearningorartificialintelligencetechniques,shouldnotbeusedinsmalldatasets.Noattemptsaremadetowardsvalidation,orvalidationisdoneinefficiently.Forexample,asplit-sampleapproachisfollowed,leadingtoasmallersampleformodeldevelopmentandasmallersampleformodelvalidation.Bettermethodsarenowadaysavailableandshouldbeusedfarmoreoften,specificallybootstrapresampling.(c)Claimsareexaggerated:
Often,weseestatementssuchas“theindependentpredictorswereidenti-fied”;inmanyinstances,suchfindingsarepurelyexploratoryandmaynotbereproducible;theymaylargelyrepresentnoise.Modelsarenotinternallyvalid,withoveroptimisticexpectationsofmodelperformanceinnewpatients.Onemodernmachinelearningmethodwithafancynameisclaimedasbeingsuperiortoamoretraditionalregressionapproach,whilenocon-vincingevidenceispresented,andasuboptimalmodelingstrategywasfollowedfortheregressionmodel.Faircomparisonsbetweenwell-usedstatisticalmethodsandmachinelearningmethodsarerequired.Researchersareinsufficientlyawareofoverfitting,implyingthattheirapparentfindingsaremerelycoincidental.viiiPreface(d)Poorgeneralizability:
Ifmodelsarenotinternallyvalid,wecannotexpectthemtogeneralizetonewpatients.Modelsaredevelopedforeachlocalsituation,discardingearlierfindingsoneffectsofpredictorsandearliermodels;aframeworkforcontinuousimprovementandupdatingofpredictionmodelsisrequired.Inthisbook,Isuggestmanysmallimprovementsinmodelingstrategies.Combined,theseimprovementsshouldleadtobetterdevelopment,validation,andupdatingofpredictionmodels.IntendedAudienceReadersshouldhaveabasicknowledgeofbiostatistics,especiallyregressionanalysis,butnostrongbackgroundinmathematicsisrequired.Thenumberofformulasisdeliberatelykeptsmall.Thefocusisonconceptsinpredictionresearch,whicharealsorelevanttocomputerscientistsanddatascientistsworkingonpre-dictioninthefieldofPredictiveAnalytics.Usually,abottom-upapproachisfollowedinteachingregressionanalysistechniques,startingwiththerequiredtypeofdata,modelassumptions,estimationmethods,andbasicinterpretation.Thisbookismoretop-down:
giventhatwewanttopredictanoutcome,howcanwebestutilizeregressionandrelatedtechniques?
Threelevelsofreadersareenvisioned:
(a)Thecoreintendedaudienceisformedbyepidemiologistsandappliedbio-statisticianswhowanttodevelop,validate,orupdateapredictionmodel.Bothstudentsandprofessionalsshouldfindpracticalguidanceinthisbook,espe-ciallybytheproposedsevenstepstodevelopavalidmodel(PartII).(b)Thesecondgroupisformedbyclinicians,policy-makers,andhealthcareprofes-sionalswhowanttojudgeastudythatpresentsorvalidatesapredictionmodel.Thisbookshouldaidtheminacriticalappraisal,providingexplanationsoftermsandconceptsthatarecommoninpublicationsonpredictionmodels.Theyshouldtrytoreadchaptersofparticularinterest,orreadthemaintextofthechapters.Theycanskiptheexamplesandmoretechnicalsections(indicatedwith*).(c)Thethirdgroupincludesmoretheoreticalresearchers,suchas(bio)statisticiansandcomputerscientists,whowanttoimprovethemethodsthatweuseinpredictionmodels.Theymayfindinspirationforfurthertheoreticalworkandsimulationstudiesinthisbook.Manyofthemethodsinpredictionmodelingarenotfullydevelopedyet,andcommonsenseorintuitionunderliessomeoftheproposedapproachesinthisbook.Improvementsarewelcome!
PrefaceixOtherSourcesManyexcellenttextbooksexistonregressionanalysistechniques,buttheseusuallydonothaveafocusonmodelingstrategiesforprediction.ThemainexceptionisFrankHarrellsbook“RegressionModelingStrategies”.Hebringsadvancedbio-statisticalconceptstopracticalapplication,supportedbythermspackageforR.Harrellsbookmay,however,betooadvancedforclinicalandepidemiologicalresearchers.ThisalsoholdsfortheHastie,Tibshirani,andFriedmanquitethoroughtextbook“TheElementsofStatisticalLearning”.Thesebooksareveryusefulforamorein-depthdiscussionofstatisticaltechni