1、Data Analysis using the SAS Language Data StepData Analysis using the SAS Language/Data Step Data Analysis using the SAS LanguageIn order to understand the data step it is helpful to understand the SAS data set. A SAS data set consists of observations and variables, these are respectively the rows a
2、nd columns of data. Using a database concept, a SAS dataset represents a table with records and fields represented as observations and variables respectively.SAS datasets can be temporary, existing during the life of the program or they can be permanent and persist between programs. The SAS dataset
3、is a proprietary format that can only be accessed by the SAS System. However, SAS datasets can be written to data base tables, text files, or to PC files such as Excel or CSV format. And, as can be expected, SAS data sets can be created from any of these sources as well. The benefit of SAS data sets
4、 is the speed in which SAS can load data and begin processing.Completion status: this resource is considered to be complete.Subject classification: this is a science resource .Subject classification: this is a statistics resource .Subject classification: this is an information technology resource .E
5、ducational level: this is a tertiary (university) resource.Contentshide 1 Data Step Language 2 Statements o 2.1 Data Input and Output 2.1.1 Filename Statement 2.1.2 File Statement 2.1.3 Input Statement 2.1.4 Informat Statement 2.1.5 Format Statement 2.1.6 Put Statement 2.1.7 Libname Statement 2.1.8
6、Data Statement 2.1.9 Set Statement 2.1.10 Merge Statement 2.1.11 Output Statement 2.1.12 By Statemento 2.2 Assignment Statements 2.2.1 Expressions 2.2.2 Retain Statement 2.2.3 Arrayso 2.3 Logic 2.3.1 Subsetting If Statement 2.3.2 If . Then 2.3.3 If . Then . Else 2.3.4 Compound Statementso 2.4 Loops
7、2.4.1 Do While Loop 2.4.2 Do Until Loop 2.4.3 Incremental Do Loopedit Data Step LanguageIn the SAS Language, statements are written in a very free form with few rules. For example, SAS statements can span several lines or several statements can be placed on a single line. All SAS statements must end
8、 with a semicolon ;. However, it is useful to indent code in the appropriate places in order to make the program more readable. Comments can also be used to explain the purpose of each section of code.SAS is not case sensitive; however, variable names retain the casing used from when they were first
9、 defined within the program. This means variable names will appear in reports using the case established when they were created. Within the data step, the SAS language provides the input, output and logic for manipulating data.Each data step begins with the data statement which defines the name of t
10、he SAS data set created by this step, and ends with the run statement. The statements within the data and run statements are executed for each observation in the input data set. Looping through observations is automatic within a SAS data step. The following example is a program that reads in three v
11、ariables from the data file, apples, and makes a calculation to create a new variable. The result is a SAS data set, also called apples, which contains every observation from the original file and four variables, three input and one calculated. Filename apples c:fruitsapples.txt; data apples; infile
12、 apples; input Type $15. Quantity 6. Price_per_unit 6.2; purchase_cost = Quantity * Price_per_unit ; run;The file, apples.txt, contains three variables: Type, width 15 characters, name of apple Quantity, width 6 digits, amount of apples purchsed Price_per_unit width 6 with 2 decimalsThe first seven
13、observations of the file, apples.txt, look like this. McIntosh 100 2.00 Red Delicious 75 2.25 Granny Smith 125 2.05 Jonathon 120 1.95 Rome 130 2.00 Gala 150 1.95 Fuji 200 2.25The resulting sas data set, apples, will have four variables for each observation. Three were read from apple.txt and one was
14、 created in the data step. The new variable, purchase_cost, is the product of the variables quantity and purchase_price.edit StatementsSAS has several types of statements used in the SAS Data Step. These statements provide the building blocks for designing powerful programming modules within the dat
15、a step. There are several SAS procedures, or procs, that are closely tied to processing data in the data step. These procedures include proc format, proc print, proc sort, proc sql, and proc summary. They provide routines that work across several observations at once. Each proc has its own set of st
16、atements that provide parameters, options, variables, and output data sets. By interweaving data steps with the appropriate procedures, powerful SAS programs can be built. First we focus on the data step language statements.edit Data Input and OutputInput and output statements are used to identify b
17、oth the source and destination of data and how to read and write data to and from files. SAS has seperate statements for using non SAS data sets and for SAS data sets. However, SAS can treat data in a non SAS database as if it is a SAS data set. There are a couple of steps that must always be follow
18、ed. First, a logical link is established to the location of the physical data file. This is done with a filename statement (for non SAS data sets) or a libname statement (for SAS data sets). The physical file could be a SAS data set in a SAS library, a text file, or a file from another vendor such a
19、s SAP, Oracle, or Microsoft. SAS data sets use a proprietary format optimized for the SAS system; SAS temporary files are also stored as SAS data sets. SAS can also access data from different database vendors such as Oracle, IBM DB2, and Microsoft Excel and Access as if they were SAS data sets. Text
20、 files are accessed using the filename statement with the infile or file statement.edit Filename StatementThe filename statement specifies the name of a physical that will either contain data to be read, or a file that will be created and written to. Normally this is a text file. Variations of this
21、statement allow access to files using FTP, HTTP, pipes, email and other protocols. Within the Data Step, the file and infile statements reference the filename statement. This is done by the name on the statement. In the example below, the infile statement, shipment, identifies the filename statement
22、, shipment, associated with a physical file, fruit.txt. filename shipment c:foodsfruit.txt; data fruit_shipment; infile shipment; input shipment_number 5. +1 shipment_date mmddyy6. +1 type_item $15. 30 price 6.2 40 quantity 5.; run;edit File StatementThe file statement, apples, associates a filename
23、 statement, apples , associated with a physical file, apple_list.txt. This file will be used to store output. If the file, apple_list.txt exists it will be written over, otherwise it will be created and written to. filename apples c:foodsapple_list.txt; data _null_; set fruit_shipment; file apples;
24、if type_item=APPLE then put shipment_number 5. price 6.2 40 quantity 5.; run;edit Input StatementThe input statement lists the names of the variables and the formats needed to read them. Formatting for input gives SAS the rules it needs to extract data from the input file. This involves positioning
25、the input pointer to the correct position, giving the name for the new variable, the type (number of character) and the width or number of characters that the input variable uses in the input file. The statement below lists five variables to input from a file. input shipment_number 5. +1 shipment_da
26、te mmddyy6. +1 type_item $15. 30 price 6.2 40 quantity 5.;Formats are used for input and output with text files. The $ is used for character strings. A format descriptor always has a decimal either at the end of, or before the number of decimals. There are other format instructions which tell SAS wh
27、ere to move its input pointer. Below is a break down to explain the different format options used in the previous statement.This tables explains the input statement above while demonstrating much of its functionality. shipment_number 5. a five character numeric with no decimals +1 tells the input po
28、inter to skip ahead one character (one position) shipment_date mmddyy6. a Six character date field 2 digit for month, day and year +1 skip a character type_item $15. a character string of length 15, the $ indicates characters 30 move the input pointer to position 30 price 6.2 a six character numeric
29、 field with 2 decimals 40 move to column 40 quantity 5. a five character numeric with no decimalsedit Informat StatementThe informat statement specifies the default format for inputting data from a textdata file. This includes specifying the fields with embedded commas, currency and date fields. Inf
30、ormat information is also saved in a SAS data step.edit Format StatementThe format statement specifies the default format for writing variables to reports, on SAS procedure output, and to text files. This includes writing fields with embedded commas, currency and date fields. A format statement in t
31、he data step attaches the format to the field for all subsequent output. A format statement can also be used in a proc step to override a default format. Format information is also saved in a SAS data step.edit Put StatementThe put statement allows data to be output as text. It is similar to the inp
32、ut statement. Each variable listed is followed by its format. put type_fruit $15. 25 price 8.2 +5 special 8.2 +1 quantity 9.;edit Libname StatementThe libname statement references to the location of the folder that will contain permanent SAS data sets. These are SAS data sets that will persist beyond the current program. The name myfruit will be used along wi
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1