视频算法工程视频编解码相关算法及算法移植和优化.docx

资源描述

视频算法工程视频编解码相关算法及算法移植和优化.docx

《视频算法工程视频编解码相关算法及算法移植和优化.docx》由会员分享，可在线阅读，更多相关《视频算法工程视频编解码相关算法及算法移植和优化.docx（25页珍藏版）》请在冰豆网上搜索。

视频算法工程视频编解码相关算法及算法移植和优化.docx

视频算法工程视频编解码相关算法及算法移植和优化

AnFPGAimplementationofHW/SWcodesignarchitectureforH.263videocoding?

ElectronicsandCommunications

Inthispaper,wepresentanefficientHW/SWcodesignarchitectureforH.263videoencoderanditsFPGAimplementation.EachmoduleoftheencoderisinvestigatedtofindwhichapproachbetweenHWandSWisbettertoachievereal-timeprocessingspeedaswellasflexibility.ThehardwareportionsincludetheDiscreteCosineTransform（DCT）,inverseDCT（IDCT）,quantization（Q）andinversequantization（IQ）.RemainingpartswererealizedinsoftwareexecutedbytheNIOSIIsoftcoreprocessor.ThispaperalsointroducesefficientdesignmethodsforHWandSWmodules.Inhardware,anefficientarchitectureforthe2-DDCT/IDCTissuggestedtoreducethechipsize.ANIOSIICustominstructionlogicisusedtoimplementQ/IQ.Softwareoptimizationtechniqueisalsoexploredbyusingthefastblock-matchingalgorithmformotionestimation（ME）.ThewholedesignisdescribedinVHDLlanguage,verifiedinsimulationsandimplementedinStratixIIEP2S60FPGA.Finally,theencoderhasbeentestedontheAlteraNIOSIIdevelopmentboardandcanworkupto120?

MHz.ImplementationresultsshowthatwhenHW/SWcodesignisused,a15.8-16.5timesimprovementincodingspeedisobtainedcomparedtothesoftwarebasedsolution.

ArticleOutline

1.Introduction

2.BaselineH.263videocoding

2.1.Pictureformatandorganization

2.2.OverviewoftheH.263videocodingstandard

2.2.1.Motionestimationandcompensation

2.2.2.DCTtransform

2.2.3.Quantization

2.2.4.Entropycoding

3.TheHW/SWcodesignplatform

3.1.FPGAplatform

3.2.TheNIOSIIdevelopmentboard–theHW/SWplatform

3.2.1.NIOSIICPU

3.2.2.NIOSIIcustominstructionlogic

3.3.TheHW/SWcodesignprocess

3.4.UsingembeddedLinuxwithcodesign

4.TimingoptimizationoftheH.263encoder

4.1.Timingoptimization

4.2.Hardware/softwarepartioning

4.2.1.Optimizationinmotionestimation

4.2.2.OptimizationinDCTandIDCT

4.2.3.Optimizationinquantizationandinversequantization

5.DesignenvironmentandFPGAimplementationofH.263coder

5.1.OverviewoftheSTRATIXIIFPGAarchitecture

5.2.FPGAimplementationofH.263videocoder

5.2.1.Systemenvironment

5.2.2.2-DDCT/IDCTcoprocessorcore

5.3.Implementationresults

6.Experimentalresults

7.Conclusions

Areal-timeversatileroadwaypathextractionandtrackingonanFPGAplatform?

ComputerVisionandImageUnderstanding

ThispaperpresentsanalgorithmforroadwaypathextractionandtrackinganditsimplementationinaFieldProgrammableGateArray（FPGA）device.TheimplementationisparticularlysuitableforuseasacorecomponentofaLaneDepartureWarning（LDW）system,whichrequireshigh-performancedigitalimageprocessingaswellaslow-costsemiconductordevices,appropriateforthehighvolumeproductionoftheautomotivemarket.TheFPGAtechnologyprovedtobeaproperplatformtomeetthesetwocontrastingrequirements.TheproposedalgorithmisspecificallydesignedtobecompletelyembeddedinFPGAhardwaretoprocesswideVGAresolutionvideosequencesat30framespersecond.Themaincontributionsofthisworkliein（i）theproperselection,customizationandintegrationofthemainfunctionsforroadextractionandtrackingtocopewiththeaddressedapplication,and（ii）thesubsequentFPGAhardwareimplementationasamodulararchitectureofspecializedblocks.ExperimentsonrealroadscenariovideosequencesrunningontheFPGAdeviceillustratethegoodperformanceoftheproposedsystemprototypeanditsabilitytoadapttovaryingcommonroadwayconditions,withouttheneedforaper-installationcalibrationprocedure.

ArticleOutline

1.Introduction

2.Relatedwork

3.Theproposedmethod

3.1.Roadmodel

3.2.Pre-processingpipeline

3.3.Modelfitting

3.3.1.KandMestimation

3.3.2.BLandBRestimation

3.4.Modeltracking

4.FPGAimplementation

5.Experimentalresultsanddiscussion

5.1.FPGAperformance

5.2.Algorithmperformance

6.Conclusions

Platform-independentMB-basedAVSvideostandardimplementation?

SignalProcessing:

ImageCommunication

AVS1-P2isthenewestvideostandardofAudioVideocodingStandard（AVS）workgroupofChina,whichprovidescloseperformancetoH.264/AVCmainprofilewithlowercomplexity.Inthispaper,aplatform-independentsoftwarepackagewithmacroblock-based（MB-based）architectureisproposedtofacilitateAVSvideostandardimplementationonembeddedsystem.Comparedwiththeframe-basedarchitecture,whichiscommonlyutilizedforPCplatformorientedvideoapplications,theMB-baseddecoderperformsallofthedecodingprocesses,exceptthehigh-levelsyntaxparsing,inasetofMB-basedbufferswithadequatesizeforsavingtheinformationofthecurrentMBandtheneighboringreferenceMBstominimizetheon-chipmemoryandtosavethetimeconsumedinon-chip/off-chipdatatransfer.Bymodifyingthedataflowanddecodinghierarchy,simulatingthedatatransferbetweentheon-chipmemoryandtheoff-chipmemory,andmodularizingthebufferdefinitionandmanagementforlow-leveldecodingkernels,theMB-basedsystemarchitectureprovidesover80%reductioninon-chipmemorycomparedtotheframe-basedarchitecturewhendecoding720psequences.ThestoragecomplexityisalsoanalyzedbyreferencingtheperformanceevaluationoftheMB-baseddecoder.TheMB-baseddecoderimplementationprovidesanefficientreferencetofacilitatedevelopmentofAVSapplicationsonembeddedsystem.ThecomplexityanalysisprovidesroughstoragecomplexityrequirementsforAVSvideostandardimplementationandoptimization.

ArticleOutline

1.Introduction

2.AVS1-P2standardoverview

3.Systemarchitecture

3.1.Frame-basedAVSdecoder

3.2.MB-basedAVSdecoder

4.MB-basedAVSdecoderimplementation

4.1.MB-basedbufferupdate

4.2.MB-basedIntraprediction

4.3.MB-basedmotioncompensation

4.4.MB-basedde-blockingfilter

5.Applicationsandcomplexityanalysis

5.1.Applications

5.2.Complexityanalysis

6.Conclusions

Hardware/softwareco-designofareal-timekernelbasedtrackingsystem?

SystemsArchitecture

Theprobabilisticvisualtrackingmethodsusingcolorhistogramshavebeenproventoberobusttotargetmodelvariationsandbackgroundilluminationchangesasshownbytherecentresearch.However,therequiredcomputationalcostishighduetointensiveimagedataprocessing.Theembeddedsolutionofsuchalgorithmsbecomechallengingduetohighcomputationalpowerdemandandalgorithmcomplexity.Thispaperpresentsahardware/softwareco-designarchitectureforimplementationofthewell-knownkernelbasedmeanshifttrackingalgorithm.Thedesignusescolorhistogramofthetargetastrackingfeature.Thetargetissearchedintheconsecutiveimagesbymaximizingthestatisticalmatchofthecolordistributions.Thetargetlocalizationisbasedongradientbasediterativesearchinsteadofexhaustivesearchwhichmakesthesystemcapableofachievingframerateuptohundredsofframespersecondwhiletrackingmultipletargets.Thedesign,whichisfullystandalone,isimplementedonalow-costmedium-sizefieldprogrammablegatearray（FPGA）device.Thehardwarecostofthedesigniscomparedwithsomeothertrackingsystems.Theperformanceofthesystemintermsofspeedisevaluatedandcomparedwiththesoftwarebasedimplementation.Itisexpectedthattheproposedsolutionwillfinditsutilityinapplicationslikeembeddedautomaticvideosurveillancesystems.

ArticleOutline

1.Introduction

2.Designapproach

3.Coprocessorarchitecture

3.1.Imagedecimationandcropping

3.2.Epanechnikovkernelcalculation

3.3.Histogramcalculation

3.4.Meanshiftvectorcalculation

3.5.Bhattacharyyacoefficientcalculation

4.Hardwareimplementation

5.Performanceevaluation

5.1.Comparisonwithothersystems

5.2.Systemperformance

6.Experimentalresults

7.Conclusion

Acknowledgements

AutomatedframeworkforpartitioningDSPapplicationsinhybridreconfigurableplatforms?

MicroprocessorsandMicrosystems

Inthispaper,wepresentasoftwareframeworkthatimplementsaformalizedmethodologyforpartitioningDigitalSignalProcessingapplicationsbetweenreconfigurablehardwareblocksofdifferentgranularity.Ahybridgenericreconfigurablearchitectureisconsidered,sothatthemethodologyisapplicabletoalargevarietyofhybridreconfigurablesystems.Thedevelopedframeworkiscomposedofanalysis,partitioning,andmappingtools.Although,theframeworkisparametricalinrespecttothemappingproceduresforthefineandcoarse-grainreconfigurableunits,weprovidespecificmappingalgorithmsforthesetypesofhardware.Inthiswork,themethodologyisvalidatedusingfivereal-worlddigitalsignalprocessingapplications;anorthogonalfrequencydivisionmultiplexingtransmitter,acavitydetector,avideocompressiontechnique,aJPEGencoder,andawavelet-basedimagecompressor.Theexperimentsreportthatanaverageclockcyclesdecreaseof60.7%,relativetoanallfine-grainmappingsolution,isachievedusingthedevelopedframeworkfortheconsideredapplications.

ArticleOutline

1.Introduction

2.Relatedwork

3.Partitioningmethodology

3.1.HybridSoCplatform

3.2.Methodologydescription

4.Frameworkdescription

4.1.CDFGcreation

4.2.Analysis

4.3.Mappingtofine-grainreconfigurablehardware

4.3.1.High-levelmappingphase

4.3.2.Low-levelmappingphase

4.4.Mappingtocoarse-grainreconfigurablehardware

4.4.1.Architectureofthecoarse-grainreconfigurabledata-path

4.4.2.Descriptionofthemappingalgorithm

4.5.Partitioningengine

5.Results

5.1.Experimentalset-up

5.2.Experimentation

6.Conclusions

嵌入式视频播放专用优化处理器/芯片

展开阅读全文

视频算法工程 视频编解码相关算法及算法移植和优化.docx

视频算法工程视频编解码相关算法及算法移植和优化.docx