W4 Who When Where What翻译原文.docx
《W4 Who When Where What翻译原文.docx》由会员分享,可在线阅读,更多相关《W4 Who When Where What翻译原文.docx(18页珍藏版)》请在冰豆网上搜索。
W4WhoWhenWhereWhat翻译原文
W4:
Who?
When?
Where?
What?
ARealTimeSystemforDetectingandTrackingPeople
IsmailHaritaoglu,DavidHarwoodandLarryS.Davis
ComputerVisionLaboratory
UniversityofMaryland
CollegePark,MD20742
Abstract
W4isarealtimevisualsurveillancesystemfordetectingandtrackingpeopleandmonitoringtheiractivitiesinanoutdoorenvironment.Itoperatesonmonoculargrayscalevideoimagery,oronvideoimageryfromaninfraredcamera.Unlikemanyofsystemsfortrackingpeople,W4makesnouseofcolorcues.Instead,W4employsacombinationofshapeanalysisandtrackingtolocatepeopleandtheirpartshead,hands,feet,torsoandtocreatemodelsofpeople'sappearancesothattheycanbetrackedthroughinteractionssuchasocclusions.W4iscapableofsimultaneouslytrackingmultiplepeopleevenwithocclusion.Itrunsat25Hzfor320x240resolutionimagesonadual-PentiumPC.
1.Introduction
W4isarealtimesystemfortrackingpeopleandtheirbodypartsinmonochromaticimagery.Itconstructsdynamicmodelsofpeople'smovementstoanswerquestionsaboutwhattheyaredoing,andwhereandwhentheyact.Itconstructsappearancemodelsofthepeopleittrackssothatitcantrackpeoplethroughocclusioneventsintheimagery.InthispaperwedescribethecomputationalmodelsemployedbyW4todetectandtrackpeopleandtheirparts.ThesemodelsaredesignedtoallowW4todeterminetypesofinteractionsbetweenpeopleandobjects,andtoovercometheinevitableerrorsandambiguitiesthatariseindynamicimageanalysissuchasinstabilityinsegmentationprocessesovertime,splittingofobjectsduetocoincidentalalignmentofobjectspartswithsimilarlycoloredbackgroundregions,etc.W4employsacombinationofshapeanalysisandrobusttechniquesfortrackingtodetectpeople,andtolocateandtracktheirbodyparts.Itbuilds“appearance"modelsofpeoplesothattheycanbeidentifiedafterocclusionsorafterotherinteractionsduringwhichW4cannottrackthemindividually.
W4hasbeendesignedtoworkwithonlymonochromaticvideosources,eithervisibleorinfrared.Whilemostpreviousworkondetectionandtrackingofpeoplehasreliedheavilyoncolorcues,W4isdesignedforoutdoorsurveillancetasks,andparticularlyfornighttimeorotherlowlightlevelsituations.Insuchcases,colorwillnotbeavailable,andpeopleneedtobedetectedandtrackedbasedonweakerappearanceandmotioncues.W4isarealtimesystem.ItcurrentlyisimplementedonadualprocessorPentiumPCandcanprocessbetween20-30framesperseconddependingontheimageresolutiontypicallylowerforIRsensorsthanvideosensorsandthenumberofpeopleinitsfieldofview.Inthelongrun,W4willbeextendedwithmodelstorecognizetheactionsofthepeopleittracks.Specifically,weareinterestedininteractionsbetweenpeopleandobjects–e.g.,peopleexchangingobjects,leavingobjectsinthescene,takingobjectsfromthescene.Thedescriptionsofpeople-their-globalmotionsandthemotionsoftheirparts-developedbyW4aredesignedtosupportsuchactivityrecognition.
W4currentlyoperatesonvideotakenfromastationarycamera,andmanyofitsimageanalysisalgorithmswouldnotgeneralizeeasilytoimagestakenfromamovingcamera.Otherongoingresearchinourlaboratoryattemptstodevelopbothappearanceandmotioncuesfromamovingsensorthatmightalertasystemtothepresenceofpeopleinitsfieldofregard[9].Atthispoint,thesurveillancesystemmightstopandinvokeasystemlikeW4toverifythepresenceofpeopleandrecognizetheiractions.Moregenerally,however,onewouldbeinterestedindetectingandtrackingpeoplefromamovingsurveillanceplatform,andthisisatopiccurrentlybeinginvestigatedinourlaboratoryalso.
InW4,foregroundregionsaredetectedineveryframebyacombinationofbackgroundanalysisandsimplelowlevelprocessingoftheresultingbinaryimage.Thebackgroundsceneisstaticallymodeledbytheminimumandmaximumintensityvaluesandmaximaltemporalderivativeforeachpixelrecordedoversomeperiod,andisupdatedperiodically.ThesealgorithmsaredescribedinSection3.Eachforegroundregionismatchedtothecurrentsetofobjectsusingacombinationofshapeanalysisandtracking.Theseincludesimplespatialoccupancyoverlaptestsbetweenthepredictedlocationsofobjectsandthelocationsofdetectedforegroundregions,and“dynamic"templatematchingalgorithmsthatcorrelateevolvingappearancemodelsofobjectswithforegroundregions.Second-ordermotionmodels,whichcombinerobusttechniquesforregiontrackingandmatchingofsilhouetteedgeswithrecursiveleastsquareestimation,areusedtopredictthelocationsofobjectsinfutureframes.ThesealgorithmsaredescribedinSection4.Acardboardhumanmodelofapersoninastandarduprightposeisusedtomodelthehumanbodyandtopredictthelocationofhumanbodypartshead,torso,hands,legsandfeet.Thelocationsofthesepartsareverifiedandrefinedusingdynamictemplatematchingmethods.W4candetectandtrackmultiplepeopleincomplicatedscenesat25Hzspeedfor320x240resolutionon300MHzdual-PentiumPC.W4hasalsobeenappliedtoinfraredvideoimageryat30Hzfor160x120resolutiononthesamePC.
2.PreviousTrackingSystems
Pfinder[1]isareal-timesystemfortrackingapersonwhichusesamulti-classstatisticalmodelofcolorandshapetosegmentapersonfromabackground.Itfindsandtrackspeople'sheadandhandsunderawiderangeofviewingcondition.
[5]isageneralpurposesystemformovingobjectdetectionandeventrecognitionwheremovingobjectsmovingobjectsaredetectedusingchangedetectionandtrackedusingfirst-orderpredictionandnearestneighbormatching.Eventsarerecognizedbyapplyingpredicatestoagraphformedbylinkingcorrespondingobjectsinsuccessiveframes.
KidRooms[2,8]isatrackingsystembasedon”closed-worldregions".Theseareregionsofspaceandtimeinwhichthespecificcontextofwhatisintheregionsisassumedtobeknown.Theseregionsaretrackedinreal-timedomainswhereobjectmotionsarenotsmoothorrigid,andwheremultipleobjectsareinteracting.Breglerusesmanylevelsofrepresentationbasedonmixturemodels,EM,andrecursiveKalmanandMarkovestimationtolearnandrecognizehumandynamics[4].Deformabletrackersthattracksmallimagesofpeoplearedescribedin[6].
3.BackgroundSceneModelingandForegroundRegionDetection
FramedifferencinginW4isbasedonamodelofbackgroundvariationobtainedwhilethescenecontainsnopeople.Thebackgroundsceneismodeledbyrepresentingeachpixelbythreevalues;itsminimumandmaximumintensityvaluesandthemaximumintensitydifferencebetweenconsecutiveframesobservedduringthistrainingperiod.ThesevaluesareestimatedoverseveralsecondsofvideoandareupdatedperiodicallyforthosepartsofthescenethatW4determinestocontainnoforegroundobjects.
Foregroundobjectsaresegmentedfromthebackgroundineachframeofthevideosequencebyafourstageprocess:
thresholding,noisecleaning,morphologicalfilteringandobjectdetection.
Eachpixelisfirstclassifiedaseitherabackgroundoraforegroundpixelusingthebackgroundmodel.Givingtheminimum,maximum,andthelargestinterframeabsolutedifferenceimagesthatrepresentthebackgroundscenemodel,pixelxfromimageIisaforegroundpixelif:
Figure1:
MotionestimationofbodyusingSilhouetteEdgeMatchingbetweentwosuccessiveframea:
inputimage;b:
detectedforegroundregions;c:
alignmentofsilhouttheedgesbasedondifferenceinmedian;d:
finalallignmentaftersilhouettecorelation.
|M(x)-I(x)|>D(x)or|N(x)-I(x)|>D(x)
(1)
Thresholdingalone,however,isnotsufficienttoobtainclearforegroundregions;itresultsinasignificantlevelofnoise,forexample,duetoilluminationchanges.W4usesregion-basednoisecleaningtoeliminatenoiseregions.Afterthresholding,oneiterationoferosionisappliedtoforegroundpixelstoeliminateone-pixelthicknoise.Then,afastbinaryconnected-componentoperatorisappliedtofindtheforegroundregions,andsmallregionsareeliminated.Sincetheremainingregionsaresmallerthantheoriginalones,theyshouldberestoredtotheiroriginalsizesbyprocessessuchaserosionanddilation.
Generally,findingasatisfactorycombinationoferosionanddilationstepsisquitedifficult,andnofixedcombinationworkswell,ingeneralonouroutdoorimages.Instead,W4appliesmorphologicaloperatorstoforegroundpixelsonlyafternoisepixelsareeliminated.So,W4reappliesbackgroundsubtraction,followedbyoneinterationeachofdilationanderosion,butonlytothosepixelsinsidetheboundingboxesoftheforegroundregionsthatsurvivedthesizethresholdingoperation.
Asthefinalstepofforegroundregiondetection,abinaryconnectedcomponentanalysisisappliedtotheforegroundpixelstoassignauniquelabeltoeachforegroundobject.W4generatesasetoffeaturesforeachdetectedforegroundobject,includingitslocallabel,centroid,median,andboundingbox.
4ObjectTracking
Thegoalsoftheobjecttrackingstageareto:
Todeterminewhenanewobjectentersthesystem'sfieldofview,andinitializemotionmodelsfortrackingthatobject.
TocomputethecorrespondencebetweentheforegroundregionsdetectedbythebackgroundsubtractionandtheobjectscurrentlybeingtrackedW4.
Toemploytrackingalgorithmstoestimatethepositionofthetorsoofeachobject,andupdatethemotionmodelusedfortracking.W4employssecondordermotionmodelsincludingavelocityand,possiblyzero,accelerationtermstomodelboththeoverallm