SQL Server Integration Services SSISBest Practices.docx
《SQL Server Integration Services SSISBest Practices.docx》由会员分享,可在线阅读,更多相关《SQL Server Integration Services SSISBest Practices.docx(11页珍藏版)》请在冰豆网上搜索。
![SQL Server Integration Services SSISBest Practices.docx](https://file1.bdocx.com/fileroot1/2022-11/22/d33df945-63b4-45a3-b3ea-2e0af95341c9/d33df945-63b4-45a3-b3ea-2e0af95341c91.gif)
SQLServerIntegrationServicesSSISBestPractices
SQLServerIntegrationServices(SSIS)-BestPractices
WinaKindlewith5SQLServereBooks
Problem
SQLServerIntegrationServices(SSIS)hasgrownalotfromitspredecessor(前任;前輩)DTS(DataTransformationServices)tobecomeanenterprisewideETL(Extraction,TransformationandLoading)productintermsof(就….而論;在……方面)itsusability,performance,parallelismetc.Apartfrom(除…..之外)beinganETLproduct,italsoprovidesdifferentbuilt-intaskstomanageaSQLServerinstance.AlthoughtheinternalarchitectureofSSIShasbeendesignedtoprovideahighdegreeofperformanceandparallelismtherearestillsomebestpracticestofurtheroptimizeperformance.Inthistipseries,IwillbetalkingaboutbestpracticestoconsiderwhileworkingwithSSISwhichIhavelearnedwhileworkingwithSSISforthepastcoupleofyears.
Solution
Asmentionedabove,SSISisthesuccessorofDTS(ofSQLServer7/2000).IfyouarecomingfromaDTSbackground,SSISpackagesmaylooksimilartoDTSpackages,butit'snotthecaseinreality.WhatImeanis,SSISisnotanenhancementtoDTSbutratheranewproductwhichhasbeenwrittenfromscratchtoprovidehighperformanceandparallelismandasaresultofthisitovercomesseverallimitationsofDTS.
SSIS2008hasfurtherenhancedtheinternaldataflowpipelineenginetoprovideevenbetterperformance,youmighthaveheardthenewsthatSSIS2008hassetanETLWorldrecordofuploading1TBofdatainlessthanhalfanhour.
ThebestpartofSSISisthatitisacomponentofSQLserver. ItcomesfreewiththeSQLServerinstallationandyoudon'tneedaseparatelicenseforit.Becauseofthis,alongwithhardcoreBIdevelopers,databasedevelopersanddatabaseadministratorsarealsousing ittotransferandtransformdata.
BestPractice#1-PullingHighVolumesofData
Recentlywehadtopulldatafromasourcetablewhichhad300millionsrecordstoanewtargettable.InitiallywhentheSSISpackagestarted,everythinglookedfine,datawasbeingtransferredasexpectedbutgraduallytheperformancedegradedandthedatatransferratewentdowndramatically.Duringanalysiswefoundthatthetargettablehadaprimaryclusteredkeyandtwonon-clusteredkeys.Becauseofthehighvolumeofdatainsertsintothetargettabletheseindexesgotfragmentedheavilyupto85%-90%.Weusedtheonlineindexrebuildingfeaturetorebuild/defragtheindexes,butagainthefragmentationlevelwasbackto90%afterevery15-20minutesduringtheload.Thiswholeprocessofdatatransferandparallelonlineindexrebuildstookalmost12-13hourswhichwasmuchmorethanourexpectedtimefordatatransfer.
Thenwecamewithanapproachtomakethetargettableaheapbydroppingalltheindexesonthetargettableinthebeginning,transferthedatatotheheapandondatatransfercompletion,recreateindexesonthetargettable.Withthisapproach,thewholeprocess(bydroppingindexes,transferringdataandrecreatingindexes)tookjust3-4hourswhichwaswhatwewereexpecting.
Thiswholeprocesshasbeengraphicallyshowninthebelowflowchart.Sotherecommendationistoconsiderdroppingyourtargettableindexesifpossiblebeforeinsertingdatatoitspeciallyifthevolumeofinsertsisveryhigh.
BestPractice#2-AvoidSELECT*
TheDataFlowTask(DFT)ofSSISusesabuffer(achunkofmemory)orientedarchitecturefordatatransferandtransformation.Whendatatravelsfromthesourcetothedestination,thedatafirstcomesintothebuffer,requiredtransformationsaredoneinthebufferitselfandthenwrittentothedestination.
Thesizeofthebufferisdependantonseveralfactors,oneofthemistheestimatedrowsize.Theestimatedrowsizeisdeterminedbysummingthemaximumsizeofallthecolumnsintherow.Sothemorecolumnsinarowmeanslessnumberofrowsinabufferandwithmorebufferrequirementstheresultisperformancedegradation.Henceitisrecommendedtoselectonlythosecolumnswhicharerequiredatdestination.
Evenifyouneedallthecolumnsfromthesource,youshouldusethecolumnnamespecificallyintheSELECTstatementotherwiseittakesanotherroundforthesourcetogathermeta-dataaboutthecolumnswhenyouareusingSELECT*.
Ifyoupullcolumnswhicharenotrequiredatdestination(orforwhichnomappingexists)SSISwillemitwarningslikethis.
[SSIS.Pipeline]Warning:
Theoutputcolumn"SalariedFlag"(64)onoutput"OLEDBSourceOutput"(11)andcomponent"OLEDBSource"
(1)isnotsubsequentlyusedintheDataFlowtask.RemovingthisunusedoutputcolumncanincreaseDataFlowtaskperformance.
[SSIS.Pipeline]Warning:
Theoutputcolumn"CurrentFlag"(73)onoutput"OLEDBSourceOutput"(11)andcomponent"OLEDBSource"
(1)isnotsubsequentlyusedintheDataFlowtask.RemovingthisunusedoutputcolumncanincreaseDataFlowtaskperformance.
Bewarewhenyouareusing"Tableorview"or"Tablenameorviewnamefromvariable"dataaccessmodeinOLEDBsource.ItbehaveslikeSELECT*andpullsallthecolumns,usethisaccessmodeonlyifyouneedallthecolumnsofthetableorviewfromthesourcetothedestination.
Tip:
Trytofitasmanyrowsintothebufferwhichwilleventuallyreducethenumberofbufferspassingthroughthedataflowpipelineengineandimproveperformance.
BestPractice#3-EffectofOLEDBDestinationSettings
TherearecouplesofsettingswithOLEDBdestinationwhichcanimpacttheperformanceofdatatransferaslistedbelow.
DataAccessMode–Thissettingprovidesthe'fastload'optionwhichinternallyusesaBULKINSERTstatementforuploadingdataintothedestinationtableinsteadofasimpleINSERTstatement(foreachsinglerow)asinthecaseforotheroptions.Sounlessyouhaveareasonforchangingit,don'tchangethisdefaultvalueoffastload.Ifyouselectthe'fastload'option,therearealsoacoupleofothersettingswhichyoucanuseasdiscussedbelow.
KeepIdentity–Bydefaultthissettingisuncheckedwhichmeansthedestinationtable(ifithasanidentitycolumn)willcreateidentityvaluesonitsown.Ifyoucheckthissetting,thedataflowenginewillensurethatthesourceidentityvaluesarepreservedandsamevalueisinsertedintothedestinationtable.
KeepNulls–Againbydefaultthissettingisuncheckedwhichmeansdefaultvaluewillbeinserted(ifthedefaultconstraintisdefinedonthetargetcolumn)duringinsertintothedestinationtableifNULLvalueiscomingfromthesourceforthatparticularcolumn.Ifyoucheckthisoptionthendefaultconstraintonthedestinationtable'scolumnwillbeignoredandpreservedNULLofthesourcecolumnwillbeinsertedintothedestination.
TableLock–Bydefaultthissettingischeckedandtherecommendationistoletitbecheckedunlessthesametableisbeingusedbysomeotherprocessatsametime.Itspecifiesatablelockwillbeacquiredonthedestinationtableinsteadofacquiringmultiplerowlevellocks,whichcouldturnintolockescalationproblems.
CheckConstraints–Againbydefaultthissettingischeckedandrecommendationistoun-checkitifyouaresurethattheincomingdataisnotgoingtoviolateconstraintsofthedestinationtable.Thissettingspecifiesthatthedataflowpipelineenginewillvalidatetheincomingdataagainsttheconstraintsoftargettable.Ifyouun-checkthisoptionitwillimprovetheperformanceofthedataload.
BestPractice#4-EffectofRowsPerBatchandMaximumInsertCommitSizeSettings
Rowsperbatch–Thedefaultvalueforthissettingis-1whichspecifiesallincomingrowswillbetreatedasasinglebatch.Youcanchangethisdefaultbehaviorandbreakallincomingrowsintomultiplebatches.Theallowedvalueisonlypositiveintegerwhichspecifiesthemaximumnumberofrowsinabatch.
Maximuminsertcommitsize–Thedefaultvalueforthissettingis'2147483647'(largestvaluefor4byteintegertype)whichspecifiesallincomingrowswillbecommittedonceonsuccessfulcompletion.Youcanspecifyapositivevalueforthissettingtoindicatethatcommitwillbedoneforthosenumberofrecords.Youmightbewondering,changingthedefaultvalueforthissettingwillputoverheadonthedataflowenginetocommitseveraltimes.Yesthatistrue,butatthesametimeitwillreleasethepressureonthetransactionlogandtempdbtogrowtremendouslyspecificallyduringhighvolumedatatransfers.
Theabovetwosettingsareveryimportanttounderstandtoimprovetheperformanceoftempdbandthetransactionlog.Forexampleifyouleave'Maxinsertcommitsize'toitsdefault,thetransactionlogandtempdbwillkeepongrowingduringtheextractionprocessandifyouaretransferringahighvolumeofdatathetempdbwillsoonrunoutofmemoryasaresultofthisyourextractionwillfail.Soitisrecommendedtosetthesevaluestoanoptimumvaluebasedonyourenvironment.
Note:
TheaboverecommendationshavebeendoneonthebasisofexperiencegainedworkingwithDTSandSSISforthelastcoupleofyears.Butasnotedbeforethereareotherfactorswhichimpacttheperformance,oneofthethemisinfrastructureandnetwork.Soyoushoulddothoroughtestingbeforeputtingthesechangesintoyourproductionenvironment.
Problem
Inthefirsttip(SQLServerIntegrationServices(SSIS)-BestPractices-Part1)ofthisseriesIwroteaboutSSISdesignbestpractices.Tocontinuedownthatpath,thistipisgoingtocoverrecommendationsrelatedtotheSQLServerDestinationAdapter,asynchronoustransformations,DefaultBufferMaxSizeandDefaultBufferMaxRows,BufferTempStoragePathandBLOBTempStoragePathaswellastheDelayValidationproperty.
Solution
Inthistipmyrecommendationsarerelatedto