视频算法工程 视频编解码相关算法及算法移植和优化.docx
《视频算法工程 视频编解码相关算法及算法移植和优化.docx》由会员分享,可在线阅读,更多相关《视频算法工程 视频编解码相关算法及算法移植和优化.docx(25页珍藏版)》请在冰豆网上搜索。
视频算法工程视频编解码相关算法及算法移植和优化
AnFPGAimplementationofHW/SWcodesignarchitectureforH.263videocoding?
ElectronicsandCommunications
Inthispaper,wepresentanefficientHW/SWcodesignarchitectureforH.263videoencoderanditsFPGAimplementation.EachmoduleoftheencoderisinvestigatedtofindwhichapproachbetweenHWandSWisbettertoachievereal-timeprocessingspeedaswellasflexibility.ThehardwareportionsincludetheDiscreteCosineTransform(DCT),inverseDCT(IDCT),quantization(Q)andinversequantization(IQ).RemainingpartswererealizedinsoftwareexecutedbytheNIOSIIsoftcoreprocessor.ThispaperalsointroducesefficientdesignmethodsforHWandSWmodules.Inhardware,anefficientarchitectureforthe2-DDCT/IDCTissuggestedtoreducethechipsize.ANIOSIICustominstructionlogicisusedtoimplementQ/IQ.Softwareoptimizationtechniqueisalsoexploredbyusingthefastblock-matchingalgorithmformotionestimation(ME).ThewholedesignisdescribedinVHDLlanguage,verifiedinsimulationsandimplementedinStratixIIEP2S60FPGA.Finally,theencoderhasbeentestedontheAlteraNIOSIIdevelopmentboardandcanworkupto120?
MHz.ImplementationresultsshowthatwhenHW/SWcodesignisused,a15.8-16.5timesimprovementincodingspeedisobtainedcomparedtothesoftwarebasedsolution.
ArticleOutline
1.Introduction
2.BaselineH.263videocoding
2.1.Pictureformatandorganization
2.2.OverviewoftheH.263videocodingstandard
2.2.1.Motionestimationandcompensation
2.2.2.DCTtransform
2.2.3.Quantization
2.2.4.Entropycoding
3.TheHW/SWcodesignplatform
3.1.FPGAplatform
3.2.TheNIOSIIdevelopmentboard–theHW/SWplatform
3.2.1.NIOSIICPU
3.2.2.NIOSIIcustominstructionlogic
3.3.TheHW/SWcodesignprocess
3.4.UsingembeddedLinuxwithcodesign
4.TimingoptimizationoftheH.263encoder
4.1.Timingoptimization
4.2.Hardware/softwarepartioning
4.2.1.Optimizationinmotionestimation
4.2.2.OptimizationinDCTandIDCT
4.2.3.Optimizationinquantizationandinversequantization
5.DesignenvironmentandFPGAimplementationofH.263coder
5.1.OverviewoftheSTRATIXIIFPGAarchitecture
5.2.FPGAimplementationofH.263videocoder
5.2.1.Systemenvironment
5.2.2.2-DDCT/IDCTcoprocessorcore
5.3.Implementationresults
6.Experimentalresults
7.Conclusions
Areal-timeversatileroadwaypathextractionandtrackingonanFPGAplatform?
?
ComputerVisionandImageUnderstanding
ThispaperpresentsanalgorithmforroadwaypathextractionandtrackinganditsimplementationinaFieldProgrammableGateArray(FPGA)device.TheimplementationisparticularlysuitableforuseasacorecomponentofaLaneDepartureWarning(LDW)system,whichrequireshigh-performancedigitalimageprocessingaswellaslow-costsemiconductordevices,appropriateforthehighvolumeproductionoftheautomotivemarket.TheFPGAtechnologyprovedtobeaproperplatformtomeetthesetwocontrastingrequirements.TheproposedalgorithmisspecificallydesignedtobecompletelyembeddedinFPGAhardwaretoprocesswideVGAresolutionvideosequencesat30framespersecond.Themaincontributionsofthisworkliein(i)theproperselection,customizationandintegrationofthemainfunctionsforroadextractionandtrackingtocopewiththeaddressedapplication,and(ii)thesubsequentFPGAhardwareimplementationasamodulararchitectureofspecializedblocks.ExperimentsonrealroadscenariovideosequencesrunningontheFPGAdeviceillustratethegoodperformanceoftheproposedsystemprototypeanditsabilitytoadapttovaryingcommonroadwayconditions,withouttheneedforaper-installationcalibrationprocedure.
ArticleOutline
1.Introduction
2.Relatedwork
3.Theproposedmethod
3.1.Roadmodel
3.2.Pre-processingpipeline
3.3.Modelfitting
3.3.1.KandMestimation
3.3.2.BLandBRestimation
3.4.Modeltracking
4.FPGAimplementation
5.Experimentalresultsanddiscussion
5.1.FPGAperformance
5.2.Algorithmperformance
6.Conclusions
Platform-independentMB-basedAVSvideostandardimplementation?
?
SignalProcessing:
ImageCommunication
AVS1-P2isthenewestvideostandardofAudioVideocodingStandard(AVS)workgroupofChina,whichprovidescloseperformancetoH.264/AVCmainprofilewithlowercomplexity.Inthispaper,aplatform-independentsoftwarepackagewithmacroblock-based(MB-based)architectureisproposedtofacilitateAVSvideostandardimplementationonembeddedsystem.Comparedwiththeframe-basedarchitecture,whichiscommonlyutilizedforPCplatformorientedvideoapplications,theMB-baseddecoderperformsallofthedecodingprocesses,exceptthehigh-levelsyntaxparsing,inasetofMB-basedbufferswithadequatesizeforsavingtheinformationofthecurrentMBandtheneighboringreferenceMBstominimizetheon-chipmemoryandtosavethetimeconsumedinon-chip/off-chipdatatransfer.Bymodifyingthedataflowanddecodinghierarchy,simulatingthedatatransferbetweentheon-chipmemoryandtheoff-chipmemory,andmodularizingthebufferdefinitionandmanagementforlow-leveldecodingkernels,theMB-basedsystemarchitectureprovidesover80%reductioninon-chipmemorycomparedtotheframe-basedarchitecturewhendecoding720psequences.ThestoragecomplexityisalsoanalyzedbyreferencingtheperformanceevaluationoftheMB-baseddecoder.TheMB-baseddecoderimplementationprovidesanefficientreferencetofacilitatedevelopmentofAVSapplicationsonembeddedsystem.ThecomplexityanalysisprovidesroughstoragecomplexityrequirementsforAVSvideostandardimplementationandoptimization.
ArticleOutline
1.Introduction
2.AVS1-P2standardoverview
3.Systemarchitecture
3.1.Frame-basedAVSdecoder
3.2.MB-basedAVSdecoder
4.MB-basedAVSdecoderimplementation
4.1.MB-basedbufferupdate
4.2.MB-basedIntraprediction
4.3.MB-basedmotioncompensation
4.4.MB-basedde-blockingfilter
5.Applicationsandcomplexityanalysis
5.1.Applications
5.2.Complexityanalysis
6.Conclusions
Hardware/softwareco-designofareal-timekernelbasedtrackingsystem?
?
SystemsArchitecture
Theprobabilisticvisualtrackingmethodsusingcolorhistogramshavebeenproventoberobusttotargetmodelvariationsandbackgroundilluminationchangesasshownbytherecentresearch.However,therequiredcomputationalcostishighduetointensiveimagedataprocessing.Theembeddedsolutionofsuchalgorithmsbecomechallengingduetohighcomputationalpowerdemandandalgorithmcomplexity.Thispaperpresentsahardware/softwareco-designarchitectureforimplementationofthewell-knownkernelbasedmeanshifttrackingalgorithm.Thedesignusescolorhistogramofthetargetastrackingfeature.Thetargetissearchedintheconsecutiveimagesbymaximizingthestatisticalmatchofthecolordistributions.Thetargetlocalizationisbasedongradientbasediterativesearchinsteadofexhaustivesearchwhichmakesthesystemcapableofachievingframerateuptohundredsofframespersecondwhiletrackingmultipletargets.Thedesign,whichisfullystandalone,isimplementedonalow-costmedium-sizefieldprogrammablegatearray(FPGA)device.Thehardwarecostofthedesigniscomparedwithsomeothertrackingsystems.Theperformanceofthesystemintermsofspeedisevaluatedandcomparedwiththesoftwarebasedimplementation.Itisexpectedthattheproposedsolutionwillfinditsutilityinapplicationslikeembeddedautomaticvideosurveillancesystems.
ArticleOutline
1.Introduction
2.Designapproach
3.Coprocessorarchitecture
3.1.Imagedecimationandcropping
3.2.Epanechnikovkernelcalculation
3.3.Histogramcalculation
3.4.Meanshiftvectorcalculation
3.5.Bhattacharyyacoefficientcalculation
4.Hardwareimplementation
5.Performanceevaluation
5.1.Comparisonwithothersystems
5.2.Systemperformance
6.Experimentalresults
7.Conclusion
Acknowledgements
AutomatedframeworkforpartitioningDSPapplicationsinhybridreconfigurableplatforms?
?
MicroprocessorsandMicrosystems
Inthispaper,wepresentasoftwareframeworkthatimplementsaformalizedmethodologyforpartitioningDigitalSignalProcessingapplicationsbetweenreconfigurablehardwareblocksofdifferentgranularity.Ahybridgenericreconfigurablearchitectureisconsidered,sothatthemethodologyisapplicabletoalargevarietyofhybridreconfigurablesystems.Thedevelopedframeworkiscomposedofanalysis,partitioning,andmappingtools.Although,theframeworkisparametricalinrespecttothemappingproceduresforthefineandcoarse-grainreconfigurableunits,weprovidespecificmappingalgorithmsforthesetypesofhardware.Inthiswork,themethodologyisvalidatedusingfivereal-worlddigitalsignalprocessingapplications;anorthogonalfrequencydivisionmultiplexingtransmitter,acavitydetector,avideocompressiontechnique,aJPEGencoder,andawavelet-basedimagecompressor.Theexperimentsreportthatanaverageclockcyclesdecreaseof60.7%,relativetoanallfine-grainmappingsolution,isachievedusingthedevelopedframeworkfortheconsideredapplications.
ArticleOutline
1.Introduction
2.Relatedwork
3.Partitioningmethodology
3.1.HybridSoCplatform
3.2.Methodologydescription
4.Frameworkdescription
4.1.CDFGcreation
4.2.Analysis
4.3.Mappingtofine-grainreconfigurablehardware
4.3.1.High-levelmappingphase
4.3.2.Low-levelmappingphase
4.4.Mappingtocoarse-grainreconfigurablehardware
4.4.1.Architectureofthecoarse-grainreconfigurabledata-path
4.4.2.Descriptionofthemappingalgorithm
4.5.Partitioningengine
5.Results
5.1.Experimentalset-up
5.2.Experimentation
6.Conclusions
嵌入式视频播放专用优化处理器/芯片