实验二词法分析器.docx

资源描述

实验二词法分析器.docx

《实验二词法分析器.docx》由会员分享，可在线阅读，更多相关《实验二词法分析器.docx（16页珍藏版）》请在冰豆网上搜索。

实验二词法分析器.docx

实验二词法分析器

魏陈强23020092204168

实验二词法分析器

一、实验目的

掌握词法分析器的构造原理，掌握手工编程或LEX编程方法之一。

二、实验内容

编写一个词法分析器，能够输入的源程序转换为单词序列输出。

三、实验要求

1．可以采用手工或LEX开发工具，采用VC环境。

2．源语言定义见教材附录A.1，其中的终结符即词法分析需要得到的tokens。

（1）该语言的关键字：

ifwhiledobreakrealtruefalseintcharboolfloat（其中，int、char、boolean、float在产生式中为basic）

所有的关键字都是保留字，并且必须是小写。

（2）id和num的正则表达式定义；

（3）专用符号：

+-*/<<=>>===!

==;,（）[]{}/**/

（4）空格由空白、换行符和制表符组成。

空格通常被忽略，除了它必须分开ID、NUM关键字。

（5）考虑注释。

注释由/*和*/包含。

注释可以放在任何空白出现的位置，且可以超过一行。

注释不能嵌套。

3．实现词法分析器的注意要点：

（1）关键字和标识符名的区别；

（2）数字的转换处理；

（3）“>=”和“>”这类单词的处理；

4、实验思路

本次实验，打算两种方法都使用。

1、采用LEX开发工具编写，熟悉LEX语言的编写格式即可。

2、手工编写

定义一个关键字数组，预先存入所以可能出现的关键字，并分别创建两个关键字表、ID表、num表、专用符号表、注释表，以及一个作为缓存的数组。

采用命令行读入语句形式，读入一行语句，设置while

（1）循环分别读取每个字符，以‘\n’作为结束。

读入一个字符，判断其属于哪个类别，然后通过调用函数转向那个类别所属的子程序去执行。

比如，读入一个字符，判断出为字母，因此调用voidalpha（）;进入字母处理子程序去，读取下个字符，如果为字母或者数字，则继续往下读，直到其既不是字母也不是数字为止，然后将从上个断点开始到目前为止的字符串存入缓存数组，并且把该缓存数组和关键字数组逐一比对，若该字符串为关键字，则存入关键字表，否则存入ID表。

同理，如果接下去遇到数字，则转入voiddigit（）;中去，取得一个字符串后，判断字符串中是否有字母，如果有则存入ID表，如果没有，则存入digit表，即数字表。

如果接下去遇到'\t'和''，则不做动作。

如果遇到'/'，则转入voidnote（）;判断是否是注释，在注释判断中，通过判断字符串是否以/*开头并且以*/结尾，如果是，则存入注释表中，如果不是，则报错，因为出现了非法字符。

最后，如果是都不属于上面任何一种情况的字符，则转入voidotherchar（）;去判断是否是专用符号，voidotherchar（）;中使用switchcase，列出所有的专用符号。

在遇到>或者<时，应该继续判断下一个字符是否是=，如果是，则匹配<=或>=。

如果对case中所有情况都不满足，则输出错误，因为出现了非法字符。

当语句扫描结束后，各个表中已经有了语句中出现的所有类型的字符串，现在要做的就是把各个表中重复出现的字符串去掉，通过分别再建一个与原表一样大小的表，如果原表中有东西，则把第一项复制到新表中，然后取原表第二项（如果有），与新表中所有项比较，如果都不一样，则加入新表，否则不加入，继续取下个项。

最后输出各个表内容即可。

5、实验代码

1、手工编写

#include

char*keyword[8]={"if","for","else","while","do","float","int","break"};

charkeywordtable[20][20],re_keywordtable[20][20];

chardigittable[20][20],re_digittable[20][20];

charotherchartable[20][20],re_otherchartable[20][20];

charidtable[20][20],re_idtable[20][20];

charnotetable[20][20];

charword[20];

voidinitialize（）;

voidalpha（）;

voiddigit（）;

voiderror（）;

voidotherchar（）;

voidprint（）;

voidcheck（）;

voidnote（）;

intdigit_num=0,keyword_num=0,otherchar_num=0,id_num=0,note_num=0;

intredigit_num=1,rekeyword_num=1,reotherchar_num=1,reid_num=1;

intflag_error=0;

charlookahead;

voidmain（）

{

printf（"请输入要分析的语句:

\n"）;

initialize（）;

lookahead=getchar（）;

while

（1）

{

if（isalpha（lookahead））

{

alpha（）;

initialize（）;

}

elseif（isdigit（lookahead））

{

digit（）;

initialize（）;

}

elseif（lookahead=='\t'||lookahead==''）

{

;

}

elseif（lookahead=='\n'）

break;

elseif（lookahead=='/'）

{

lookahead=getchar（）;

if（lookahead=='*'）

{

note（）;

initialize（）;

}

else

{

ungetc（lookahead,stdin）;

strcpy（otherchartable[otherchar_num++],"/"）;

initialize（）;

}

else

{

otherchar（）;

initialize（）;

}

lookahead=getchar（）;

}

check（）;

if（flag_error==0）

print（）;

}

voidalpha（）

{

inti=1,flag;

charch;

ch=lookahead;

word[0]=ch;

ch=getchar（）;

while（isalpha（ch）||isdigit（ch））

{

word[i++]=ch;

ch=getchar（）;

}

ungetc（ch,stdin）;

flag=0;

for（i=0;i<8;i++）

{

if（strcmp（word,keyword[i]）==0）

flag=1;

}

if（flag==1）

strcpy（keywordtable[keyword_num++],word）;

else

strcpy（idtable[id_num++],word）;

}

voiddigit（）

{

inti=1,flag;

charch;

ch=lookahead;

word[0]=ch;

ch=getchar（）;

while（isalpha（ch）||isdigit（ch））

{

word[i++]=ch;

ch=getchar（）;

}

ungetc（ch,stdin）;

flag=0;

for（i=0;word[i]!

='\0';i++）

{

if（word[i]<'0'||word[i]>'9'）

flag=1;

}

if（flag==1）

strcpy（idtable[id_num++],word）;

else

strcpy（digittable[digit_num++],word）;

}

voidotherchar（）

{

charch;

ch=lookahead;

switch（ch）{

case'!

{

ch=getchar（）;

if（ch=='='）

strcpy（otherchartable[otherchar_num++],"!

="）;

else

{

ungetc（ch,stdin）;

error（）;

}

break;

case'=':

{

ch=getchar（）;

if（ch=='='）

strcpy（otherchartable[otherchar_num++],"=="）;

else

{

strcpy（otherchartable[otherchar_num++],"="）;

ungetc（ch,stdin）;

}

break;

case'（':

strcpy（otherchartable[otherchar_num++],"（"）;break;

case'）':

strcpy（otherchartable[otherchar_num++],"）"）;break;

case';':

strcpy（otherchartable[otherchar_num++],";"）;break;

case'{':

strcpy（otherchartable[otherchar_num++],"{"）;break;

case'}':

strcpy（otherchartable[otherchar_num++],"}"）;break;

case'||':

strcpy（otherchartable[otherchar_num++],"||"）;break;

case'&&':

strcpy（otherchartable[otherchar_num++],"&&"）;break;

case'+':

strcpy（otherchartable[otherchar_num++],"+"）;break;

case'>':

{

ch=getchar（）;

if（ch=='='）

strcpy（otherchartable[otherchar_num++],">="）;

else

{

strcpy（otherchartable[otherchar_num++],">"）;

ungetc（ch,stdin）;

}

break;

case'<':

{

ch=getchar（）;

if（ch=='='）

strcpy（otherchartable[otherchar_num++],"<="）;

else

{

strcpy（otherchartable[otherchar_num++],"<"）;

ungetc（ch,stdin）;

}

break;

default:

error（）;break;

}

voiderror（）

{

flag_error=1;

printf（"输入有误!

\n"）;

}

voidinitialize（）

{

inti;

for（i=0;i<20;i++）

{

word[i]='\0';

}

voidcheck（）

{

inti,j,flag;

strcpy（re_keywordtable[0],keywordtable[0]）;

for（i=1;i

{

flag=0;

for（j=0;j

{

if（strcmp（keywordtable[i],re_keywordtable[j]）==0）

{

flag=1;

break;

}

if（flag==0）

strcpy（re_keywordtable[rekeyword_num++],keywordtable[i]）;

}

strcpy（re_digittable[0],digittable[0]）;

for（i=1;i

{

flag=0;

for（j=0;j

{

if（strcmp（digittable[i],re_digittable[j]）==0）

{

flag=1;

break;

}

if（flag==0）

strcpy（re_digittable[redigit_num++],digittable[i]）;

}

strcpy（re_otherchartable[0],otherchartable[0]）;

for（i=1;i

{

flag=0;

for（j=0;j

{

if（strcmp（otherchartable[i],re_otherchartable[j]）==0）

{

flag=1;

break;

}

if（flag==0）

strcpy（re_otherchartable[reotherchar_num++],otherchartable[i]）;

}

strcpy（re_idtable[0],idtable[0]）;

for（i=1;i

{

flag=0;

for（j=0;j

{

if（strcmp（idtable[i],re_idtable[j]）==0）

{

flag=1;

break;

}

if（flag==0）

strcpy（re_idtable[reid_num++],idtable[i]）;

}

voidnote（）

{

charch;

inti=0;

ch=getchar（）;

while

（1）

{

if（ch=='*'）

{

ch=getchar（）;

if（ch=='/'）

break;

else

{

ungetc（ch,stdin）;

word[i++]=ch;

}

else

{

word[i++]=ch;

}

ch=getchar（）;

}

strcpy（notetable[note_num++],word）;

}

voidprint（）

{

inti;

//printf（"Keywords:

\n"）;

if（keyword_num!

=0）

for（i=0;i

printf（"<%s,>\n",re_keywordtable[i]）;

//printf（"\nDigits:

\n"）;

if（digit_num!

=0）

for（i=0;i

printf（"\n",re_digittable[i]）;

//printf（"\nOtherchars:

\n"）;

if（otherchar_num!

=0）

for（i=0;i

printf（"\n",re_otherchartable[i]）;

//printf（"\nId:

\n"）;

if（id_num!

=0）

for（i=0;i

printf（"\n",re_idtable[i]）;

if（note_num!

=0）

{

printf（"注释:

\n"）;

for（i=0;i

printf（"%s",notetable[i]）;

}

printf（"\n词法分析完成!

\n"）;

}

2、LEX

%optionnoyywrap

#include

delim[\t\n]

ws{delim}+

letter[A-Za-z]

digit[0-9]

id{letter}（{letter}|{digit}）*

{ws}{}

if|while|do|break|real|true|false|int|char|bool|float{printf（"关键字：

%s\n",yytext）;}

{id}{printf（"表示符：

%s\n",yytext）;}

{digit}+{printf（"整数：

%s\n",yytext）;}

{digit}+"."{digit}*{printf（"float型：

%s\n",yytext）;}

"+"|"-"|"*"|"/"|"<"|"<="|">"|">="|"=="|"!

="|"="|";"|","|"（"|"）"|"["|"]"|"{"|"}"|"/*"|"*/"{printf（"算符：

%s\n",yytext）;}

intmain（）{

externFILE*yyin;

yyin=fopen（"d:

\1.txt","r"）;

yylex（）;

return0;

}

6、实验结果

1、手工编写

2、Lex

7、实验心得

遇到的问题①：

手工编写中，在判断字符串所属类别时，由于分支比较多，看得比较混乱，出现问题时调试比较麻烦。

解决办法：

假设输入的字符串只属于一种类别，然后编写各个分支情况的代码，最后再整合起来，这样至少可以保证，每个分支模块不会出错，调试的时候，只要把握大体即可发现问题。

遇到的问题②：

曾经试图用'\0'作为判断输入语句的结尾，结果导致错误。

解决办法：

由于采用命令行模式输入语句，因此字符串结尾以'\n'结束。

遇到的问题③：

使用switchcase语句的时候出现低级错误，把每种case情况的输出都打印了出来。

解决办法：

Case后加break。

遇到的问题④：

最后输出各个类型字符串数组表记录的出现的对应字符串时，把出现的所有字符串都输出了，也就是把重复出现的字符串也都输出了一遍。

解决办法：

在最后添加一个check（）;函数，用以剔除重复出现的字符串。

遇到的问题⑤：

由于没有采用读文件方法输入语句，采用了命令行模式输入，所有导致了只能输入一行语句，无法换行。

解决办法：

不想再改成读文件方式，因此采用lex工具编写。

遇到的问题⑥：

主要还是一些lex的语法问题。

解决办法：

XX、Google各种lex代码，解决出现的各种语法问题，并和同学讨论。

遇到的问题⑦：

产生lex.yy.c文件后，使用VC编译器编译，编译的时候，提示无法打开unistd.h库文件。

解决办法：

unistd.h是linux下的库函数，windows不支持linux的系统调用。

因此，查找unistd.h的代码，自己编写一个unistd.h,并且把lex.yy.c中的改为"unistd.h"

遇到的问题⑧：

VC编译lex.yy.c通过后，不知道如何对文件中的代码进行词法分析。

解决办法：

从cmd进入dos，进入编译lex.yy.c生产的那个Debug文件，用lex

通过编写程序和使用LEX工具进行词法分析，对词法分析器的构造有了进一步的了解，同时也巩固了所学的理论知识，以及提高了编程能力，加之能够初步使用LEX工具。

展开阅读全文