InordertounderstandthedatastepitishelpfultounderstandtheSASdataset.ASASdatasetconsistsofobservationsandvariables,thesearerespectivelytherowsandcolumnsofdata.Usingadatabaseconcept,aSASdatasetrepresentsatablewithrecordsandfieldsrepresentedasobservationsandvariablesrespectively.
SASdatasetscanbetemporary,existingduringthelifeoftheprogramortheycanbepermanentandpersistbetweenprograms.TheSASdatasetisaproprietaryformatthatcanonlybeaccessedbytheSASSystem.However,SASdatasetscanbewrittentodatabasetables,textfiles,ortoPCfilessuchasExcelorCSVformat.And,ascanbeexpected,SASdatasetscanbecreatedfromanyofthesesourcesaswell.ThebenefitofSASdatasetsisthespeedinwhichSAScanloaddataandbeginprocessing.
Completionstatus:
thisresourceisconsideredtobecomplete.
Subjectclassification:
thisisascienceresource.
Subjectclassification:
thisisastatisticsresource.
Subjectclassification:
thisisaninformationtechnologyresource.
Educationallevel:
thisisatertiary(university)resource.
Contents
[hide]
∙1DataStepLanguage
∙2Statements
o2.1DataInputandOutput
▪2.1.1FilenameStatement
▪2.1.2FileStatement
▪2.1.3InputStatement
▪2.1.4InformatStatement
▪2.1.5FormatStatement
▪2.1.6PutStatement
▪2.1.7LibnameStatement
▪2.1.8DataStatement
▪2.1.9SetStatement
▪2.1.10MergeStatement
▪2.1.11OutputStatement
▪2.1.12ByStatement
o2.2AssignmentStatements
▪2.2.1Expressions
▪2.2.2RetainStatement
▪2.2.3Arrays
o2.3Logic
▪2.3.1SubsettingIfStatement
▪2.3.2If...Then
▪2.3.3If...Then...Else
▪2.3.4CompoundStatements
o2.4Loops
▪2.4.1DoWhileLoop
▪2.4.2DoUntilLoop
▪2.4.3IncrementalDoLoop
[edit]DataStepLanguage
IntheSASLanguage,statementsarewritteninaveryfreeformwithfewrules.Forexample,SASstatementscanspanseverallinesorseveralstatementscanbeplacedonasingleline.AllSASstatementsmustendwithasemicolon";".However,itisusefultoindentcodeintheappropriateplacesinordertomaketheprogrammorereadable.Commentscanalsobeusedtoexplainthepurposeofeachsectionofcode.
SASisnotcasesensitive;however,variablenamesretainthecasingusedfromwhentheywerefirstdefinedwithintheprogram.Thismeansvariablenameswillappearinreportsusingthecaseestablishedwhentheywerecreated.Withinthedatastep,theSASlanguageprovidestheinput,outputandlogicformanipulatingdata.
EachdatastepbeginswiththedatastatementwhichdefinesthenameoftheSASdatasetcreatedbythisstep,andendswiththerunstatement.Thestatementswithinthedataandrunstatementsareexecutedforeachobservationintheinputdataset.LoopingthroughobservationsisautomaticwithinaSASdatastep.Thefollowingexampleisaprogramthatreadsinthreevariablesfromthedatafile,apples,andmakesacalculationtocreateanewvariable.TheresultisaSASdataset,alsocalledapples,whichcontainseveryobservationfromtheoriginalfileandfourvariables,threeinputandonecalculated.
Filenameapples"c:
\fruits\apples.txt";
dataapples;
infileapples;
inputType$15.Quantity6.Price_per_unit6.2;
purchase_cost=Quantity*Price_per_unit;
run;
Thefile,apples.txt,containsthreevariables:
∙Type,width15characters,nameofapple
∙Quantity,width6digits,amountofapplespurchsed
∙Price_per_unitwidth6with2decimals
Thefirstsevenobservationsofthefile,apples.txt,looklikethis.
McIntosh1002.00
RedDelicious752.25
GrannySmith1252.05
Jonathon1201.95
Rome1302.00
Gala1501.95
Fuji2002.25
Theresultingsasdataset,apples,willhavefourvariablesforeachobservation.Threewerereadfromapple.txtandonewascreatedinthedatastep.Thenewvariable,purchase_cost,istheproductofthevariablesquantityandpurchase_price.
[edit]Statements
SAShasseveraltypesofstatementsusedintheSASDataStep.Thesestatementsprovidethebuildingblocksfordesigningpowerfulprogrammingmoduleswithinthedatastep.ThereareseveralSASprocedures,orprocs,thatarecloselytiedtoprocessingdatainthedatastep.Theseproceduresincludeprocformat,procprint,procsort,procsql,andprocsummary.Theyprovideroutinesthatworkacrossseveralobservationsatonce.Eachprochasitsownsetofstatementsthatprovideparameters,options,variables,andoutputdatasets.Byinterweavingdatastepswiththeappropriateprocedures,powerfulSASprogramscanbebuilt.Firstwefocusonthedatasteplanguagestatements.
[edit]DataInputandOutput
Inputandoutputstatementsareusedtoidentifyboththesourceanddestinationofdataandhowtoreadandwritedatatoandfromfiles.SAShasseperatestatementsforusingnonSASdatasetsandforSASdatasets.However,SAScantreatdatainanonSASdatabaseasifitisaSASdataset.Thereareacoupleofstepsthatmustalwaysbefollowed.First,alogicallinkisestablishedtothelocationofthephysicaldatafile.Thisisdonewithafilenamestatement(fornonSASdatasets)oralibnamestatement(forSASdatasets).ThephysicalfilecouldbeaSASdatasetinaSASlibrary,atextfile,orafilefromanothervendorsuchasSAP,Oracle,orMicrosoft.SASdatasetsuseaproprietaryformatoptimizedfortheSASsystem;SAStemporaryfilesarealsostoredasSASdatasets.SAScanalsoaccessdatafromdifferentdatabasevendorssuchasOracle,IBMDB2,andMicrosoftExcelandAccessasiftheywereSASdatasets.Textfilesareaccessedusingthefilenamestatementwiththeinfileorfilestatement.
[edit]FilenameStatement
Thefilenamestatementspecifiesthenameofaphysicalthatwilleithercontaindatatoberead,orafilethatwillbecreatedandwrittento.Normallythisisatextfile.VariationsofthisstatementallowaccesstofilesusingFTP,HTTP,pipes,emailandotherprotocols.WithintheDataStep,thefileandinfilestatementsreferencethefilenamestatement.Thisisdonebythenameonthestatement.Intheexamplebelow,theinfilestatement,shipment,identifiesthefilenamestatement,shipment,associatedwithaphysicalfile,fruit.txt.
filenameshipment"c:
\foods\fruit.txt";
datafruit_shipment;
infileshipment;
inputshipment_number5.+1shipment_datemmddyy6.+1
type_item$15.@30price6.2@40quantity5.;
run;
[edit]FileStatement
Thefilestatement,apples,associatesafilenamestatement,apples,associatedwithaphysicalfile,apple_list.txt.Thisfilewillbeusedtostoreoutput.Ifthefile,apple_list.txtexistsitwillbewrittenover,otherwiseitwillbecreatedandwrittento.
filenameapples"c:
\foods\apple_list.txt";
data_null_;
setfruit_shipment;
fileapples;
iftype_item="APPLE"then
putshipment_number5.price6.2@40quantity5.;
run;
[edit]InputStatement
Theinputstatementliststhenamesofthevariablesandtheformatsneededtoreadthem.FormattingforinputgivesSAStherulesitneedstoextractdatafromtheinputfile.Thisinvolvespositioningtheinputpointertothecorrectposition,givingthenameforthenewvariable,thetype(numberofcharacter)andthewidthornumberofcharactersthattheinputvariableusesintheinputfile.Thestatementbelowlistsfivevariablestoinputfromafile.
inputshipment_number5.+1shipment_datemmddyy6.+1
type_item$15.@30price6.2@40quantity5.;
Formatsareusedforinputandoutputwithtextfiles.The$isusedforcharacterstrings.Aformatdescriptoralwayshasadecimaleitherattheendof,orbeforethenumberofdecimals.ThereareotherformatinstructionswhichtellSASwheretomoveitsinputpointer.Belowisabreakdowntoexplainthedifferentformatoptionsusedinthepreviousstatement.
Thistablesexplainstheinputstatementabovewhiledemonstratingmuchofitsfunctionality.
∙shipment_number5.afivecharacternumericwithnodecimals
∙+1tellstheinputpointertoskipaheadonecharacter(oneposition)
∙shipment_datemmddyy6.aSixcharacterdatefield2digitformonth,dayandyear
∙+1skipacharacter
∙type_item$15.acharacterstringoflength15,the"$"indicatescharacters
∙@30movetheinputpointertoposition30
∙price6.2asixcharacternumericfieldwith2decimals
∙@40movetocolumn40
∙quantity5.afivecharacternumericwithnodecimals
[edit]InformatStatement
Theinformatstatementspecifiesthedefaultformatforinputtingdatafromatextdatafile.Thisincludesspecifyingthefieldswithembeddedcommas,currencyanddatefields.InformatinformationisalsosavedinaSASdatastep.
[edit]FormatStatement
Theformatstatementspecifiesthedefaultformatforwritingvariablestoreports,onSASprocedureoutput,andtotextfiles.Thisincludeswritingfieldswithembeddedcommas,currencyanddatefields.Aformatstatementinthedatastepattachestheformattothefieldforallsubsequentoutput.Aformatstatementcanalsobeusedinaprocsteptooverrideadefaultformat.FormatinformationisalsosavedinaSASdatastep.
[edit]PutStatement
Theputstatementallowsdatatobeoutputastext.Itissimilartotheinputstatement.Eachvariablelistedisfollowedbyitsformat.
puttype_fruit$15.@25price8.2+5special8.2+1quantity9.;
[edit]LibnameStatement
ThelibnamestatementreferencestothelocationofthefolderthatwillcontainpermanentSASdatasets.TheseareSASdatasetsthatwillpersistbeyondthecurrentprogram.Thenamemyfruitwillbeusedalongwi