B Centroids Clusters and CrimeWord文件下载.docx
《B Centroids Clusters and CrimeWord文件下载.docx》由会员分享,可在线阅读,更多相关《B Centroids Clusters and CrimeWord文件下载.docx(22页珍藏版)》请在冰豆网上搜索。
Boulder,CO
Advisor:
AnneDougherty
Abstract
Aparticularlychallengingproblemincrimepredictionismodelingthebehaviorofaserialkiller.Sincefindingassociationsbetweenthevictimsisdifficult,wepredictwherethecriminalwillstrikenext,insteadofwhom.Suchpredictingofacriminal’sspatialpatternsiscalledgeographicprofiling.
Researchshowsthatmostviolentserialcriminalstendtocommitcrimesinaradialbandaroundacentralpoint:
home,workplace,orotherareaofsignificancetothecriminal’sactivities(forexample,apartoftownwhereprostitutesabound).These“anchorpoints”providethebasisforourmodel.
Weassumethattheentiredomainofanalysisisapotentialcrimespot,movementofthecriminalisuninhibited,andtheareainquestionislargeenoughtocontainallpossiblestrikepoints.Weconsiderthedomainametricspaceonwhichpredictivealgorithmscreatespatiallikelihoods.Addition-ally,weassumethattheoffenderisa“violent”serialcriminal,sinceresearchsuggeststhatserialburglarsandarsonistsarelesslikelytofollowspatialpatterns.
Therearesubstantialdifferencesbetweenoneanchorpointandseveral.Wetreatthesingle-anchor-pointcasefirst,takingthespatialcoordinatesofthecriminal’slaststrikesandthesequenceofthecrimesasinputs.Estimatingthepointtobethecentroidofthepreviouscrimes,wegeneratea“likelihoodcrater,”whereheightcorrespondstothelikelihoodofafuturecrimeatthatlocation.Forthemultiple-anchor-pointcase,weuseacluster-findingandsortingmethod:
Weidentifygroupingsinthedataandbuildalikelihoodcrateraroundthecentroidofeach.Eachclusterisgivenweightaccordingtorecencyandnumberofpoints.Wetestsinglepointvs.multiplepointsby
TheUMAPJournal31
(2)(2010)129–148.§
cCopyright2010byCOMAP,Inc.Allrightsreserved.Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnotice.Abstractingwithcreditispermitted,butcopyrightsforcomponentsofthisworkownedbyothersthanCOMAPmustbehonored.Tocopyotherwise,torepublish,topostonservers,ortoredistributetolistsrequirespriorpermissionfromCOMAP.
usingthepreviouscrimestopredictthemostrecentoneandcomparingwithitsactuallocation.
Weextractsevendatasetsfrompublishedresearch.Weusefourofthedatasetsindevelopingourmodelandexaminingitsresponsetochangesinsequence,geographicconcentration,andtotalnumberofpoints.Thenweevaluateourmodelsbyrunningblindontheremainingthreedatasets.
Theresultsshowaclearsuperiorityformultipleanchorpoints.
Introduction
Theliteratureongeographicpatternsinserialcrimesshowsastrongpatterningaroundananchorpoint—alocationofdailyfamiliarityforthecriminal.Webuildpredictionschemesbasedonthisunderlyingtheoryandproduceasurfaceoflikelihoodvaluesandarobustmetric.
Thefirstschemefindsasingleanchorpointusingacenter-of-massmethod;
thesecondschemeassumestwotofouranchorpointsandusesacluster-findingalgorithmtosortandgrouppoints.Bothschemesuseastatisticaltechniquethatwecallcrateringtopredictfuturecrimelocations.
Background
Thearrestin1981(andsubsequentconviction)ofPeterSutcliffeasthe“YorkshireRipper”markedavictoryforStuartKind,aforensicbiologistwhoseapplicationofmathematicalprincipleshadsuccessfullypredictedwheretheYorkshireRipperlived.
Today,information-intensivemodelscanbeconstructedusingheat-maptechniquestoidentifythehotspotsforaspecifictypeofcrime,ortoderiveassociationsbetweentherateofcriminalactivityandattributesofalocation(suchaslighting,urbanization,etc.)[Boba2005].
“Geographicallyprofiling”thecrimesofasinglecriminalhasfocusedonlocatingthecriminal’sanchorpoints—locations(suchasahome,work-place,orarelative’shouse)atwhichhespendssubstantialamountsoftimeandtowhichhereturnsregularlybetweencrimes.
CanterandLarkin[1993]proposedthataserialcriminal’shome(orotheranchorpoint)tendstobecontainedwithinacirclewhosediameteristhelinesegmentbetweenthetwofarthest-apartcrimelocations;
andthisistrueinthevastmajorityofcases[KocsisandIrwin1997].Canteretal.[2000]foundthatforserialmurders,generalizationsofsuchtechniquesonaveragereducetheareatobesearchedbynearlyafactorof10.
Bycontrast,forecastingwhereacriminalwillstrikenexthasnotbeen
exploreddeeply[Rossmo1999].PaulsenandRobinson[2009]observethatformanyU.S.policedepartmentstherearesubstantialpractical,ethical,andlegalissuesinvolvedincollectingthedataforadetailedmapping
ofcriminaltendencies,withtheresultthatonly16%ofthememployacomputerizedmappingtechnique.
Ourtreatmentoftheproblemwillemployanchor-point-findingalgo-rithm.Wegeneratelikelihoodsurfacesthatactasaprioritizationschemeforregionstomonitor,patrol,orsearch.
Assumptions
DomainisApproximatelyUrban
Weusetheword“urban”todenotefeaturesofanurbanizedareathatsimplifyourtreatment:
Theentiredomainisapotentialcrimespot,themovementofthecriminaliscompletelyunconstrained,andtheareaislargeenoughtocontainallpossiblestrikepoints.Itisimportanttonote,however,thatevenforserialcrimecommittedinsuburbs,villages,orspreadbetweentowns,theurbanizationconditionholdsonthesubsetofthemapinwhichcrimesareregularlycommitted.Toseethis,considerthethreeurbanizationconditionsseparately:
•Entiredomainisapotentialcrimespot.Everyneighborhoodcontainsapossiblecrimelocation.Suchanassumptionismadebynearlyallgeographicprofilingtechniques[Canteretal.2000;
Rossmo1999]
Itisobviousthateverydomainwillviolatetheseconditionstosomeextent:
Allbutthemostinventiveserialkillers,forexample,willnotcommitacrimeinthemiddleofalake,orintheuninhabitedfarmlandbetweensmalltowns.Nevertheless,thisobservationsimplyrequiresthattheoutputofthemodelbeinterpretedintelligently.Inotherwords,whileweassumeforsimplicitythattheentiremapisapotentialtarget,policeofficersinterpretingtheresultscaneasilyignoreanypredictionswemakewhichfallintoanobvious“deadzone.”
•Criminal’smovementisunconstrained.Becauseofthedifficultyoffind-ingreal-worlddistancedata,weinvokethe“Manhattanassumption”:
Thereareenoughstreetsandsidewalksinasufficientlygrid-likepat-ternthatmovementsalongreal-worldmovementroutesisthesameas“straight-line”movementinaspacediscretizedintocityblocks[Rossmo
1999].Kent[2006]demonstratedthatacrossseveraltypesofserialcrime,theEuclideanandManhattandistancesareessentiallyinterchangeableinpredictinganchorpoints.
•Domaincontainsallpossiblestrikepoints.Thisconditionsaysthatthetwoconditionsaboveholdonasufficientlylargearea.
Takentogether,thesethreeconditionsdescribetheregionofinterestasametricspaceinwhich
•Thesubsetofpotentialtargetsisdense,
•themetricistheL2norm,and
•thespaceis“complete”:
Sequencesofcrimesdonotleadtopredictionsofcrimesoutsidethespace.
ViolentSerialCrimesbyaSingleOffender
•Focusonviolentcrimes.Geographicprofilingismostsuccessfulformurdersandrapes,withtheaverageanchor-pointpredictionalgorithmbeing30%lesseffectiveforcriminalswhoareserialburglarsorarsonists[Canteretal.2000;
Rossmo1999].
•Serialcrimes.Wetakeserialkilling(orviolentcrime)asinvolving“threeormorepeopleoveraperiodof30ormoredays,withasignificantcooling-offperiodbetween”[HolmesandHolmes1998].
•Singleoffender.
SpatialFocus
Useoftemporaldataisproblematic.Timedatacanbeinaccurate.Also,whileresearchhasfoundcyclicalpatternswithinthetimebetweencrimes,thesepatternsdon’tassociatedirectlytopredictingthenextgeographiclocation.Whatisusefulisgeneraltrendsinspatialmovementoveranorderingofthelocations.Wehenceignorespecifictimedataincrimesetsexceptfororderingofthecrimesequence.
DevelopingaSerialCrimeTestSet
ExistingCrimeSets
Researchershavecompileddatabasesofserialcrimesfortheirownuse:
Rossmo’sFBIandSFUdatabases[Rossmo1999],LeBeau’sSanDiegoRapeCasedataset[LeBeau1992],andCanter’sBaltimorecrimeset[Canteretal.
2000].Eachofthesedatabaseswasdevelopedwithspecificmethodsofintegrityandspecificsourcelocations.Theseproprietarydatabasesarenotavailabletous,sowearefacedwithtwooptions:
simulateserialcriminaldataorfindanindirectwayofusingtheprivatedata.
TheProblemwithSimulation
Simulationmightseemlikeanattractivesolutiontothelackofdata.However,utterlyrandomcrime-sitegenerationwouldcontradicttheun-derlyingassumptionofaspatialpatterntoserialcrimes,whilegeneratingsitesaccordingtoanunderlyingdistributionwouldprejudgethepattern!
Actualdatamustbeusedifthereistobeanyconfidenceinthemodel.
AnAlternative:
PixelPointAnalysis
Instead,we“mine”theavailabledata,inRossmo[1995]andinthespatialanalysisofjourney-to-crimepatternsinserialrapecasesinLeBeau[1992].LeBeaudepictsthedat