Python正则表达式re模块.docx

资源描述

Python正则表达式re模块.docx

《Python正则表达式re模块.docx》由会员分享，可在线阅读，更多相关《Python正则表达式re模块.docx（11页珍藏版）》请在冰豆网上搜索。

Python正则表达式re模块.docx

Python正则表达式re模块

Python正则

基本说明

之前讲过关于Python正则的，都是理论的东西，现在讲讲Python正则re模块。

导入re模块：

importre

查看帮助文档：

printre._doc_

下面就是输出的帮助文档：

Supportforregularexpressions（RE）.

Thismoduleprovidesregularexpressionmatchingoperationssimilarto

thosefoundinPerl.Itsupportsboth8-bitandUnicodestrings;both

thepatternandthestringsbeingprocessedcancontainnullbytesand

charactersoutsidetheUSASCIIrange.

Regularexpressionscancontainbothspecialandordinarycharacters.

Mostordinarycharacters,like"A","a",or"0",arethesimplest

regularexpressions;theysimplymatchthemselves.Youcan

concatenateordinarycharacters,solastmatchesthestring'last'.

Thespecialcharactersare:

"."Matchesanycharacterexceptanewline.

"^"Matchesthestartofthestring.

"$"Matchestheendofthestringorjustbeforethenewlineat

theendofthestring.

"*"Matches0ormore（greedy）repetitionsoftheprecedingRE.

Greedymeansthatitwillmatchasmanyrepetitionsaspossible.

"+"Matches1ormore（greedy）repetitionsoftheprecedingRE.

"Matches0or1（greedy）oftheprecedingRE.

Non-greedyversionsofthepreviousthreespecialcharacters.

{m,n}MatchesfrommtonrepetitionsoftheprecedingRE.

{m,n}?

Non-greedyversionoftheabove.

"\\"Eitherescapesspecialcharactersorsignalsaspecialsequence.//FROMTHISWEBSITE:

[]Indicatesasetofcharacters.

A"^"asthefirstcharacterindicatesacomplementingset.

"|"A|B,createsanREthatwillmatcheitherAorB.

（...）MatchestheREinsidetheparentheses.

Thecontentscanberetrievedormatchedlaterinthestring.

（?

iLmsux）SettheI,L,M,S,U,orXflagfortheRE（seebelow）.

（?

...）Non-groupingversionofregularparentheses.

（?

P...）Thesubstringmatchedbythegroupisaccessiblebyname.

（?

P=name）Matchesthetextmatchedearlierbythegroupnamedname.

（?

#...）Acomment;ignored.

（?

=...）Matchesif...matchesnext,butdoesn'tconsumethestring.

（?

...）Matchesif...doesn'tmatchnext.

（?

<=...）Matchesifprecededby...（mustbefixedlength）.

（?

...）Matchesifnotprecededby...（mustbefixedlength）.

（?

（id/name）yes|no）Matchesyespatternifthegroupwithid/namematched,

the（optional）nopatternotherwise.

Thespecialsequencesconsistof"\\"andacharacterfromthelist

below.Iftheordinarycharacterisnotonthelist,thenthe

resultingREwillmatchthesecondcharacter.

\numberMatchesthecontentsofthegroupofthesamenumber.

\AMatchesonlyatthestartofthestring.

\ZMatchesonlyattheendofthestring.

\bMatchestheemptystring,butonlyatthestartorendofaword.

\BMatchestheemptystring,butnotatthestartorendofaword.

\dMatchesanydecimaldigit;equivalenttotheset[0-9].

\DMatchesanynon-digitcharacter;equivalenttotheset[^0-9].

\sMatchesanywhitespacecharacter;equivalentto[\t\n\r\f\v].

\SMatchesanynon-whitespacecharacter;equiv.to[^\t\n\r\f\v].

\wMatchesanyalphanumericcharacter;equivalentto[a-zA-Z0-9_].

WithLOCALE,itwillmatchtheset[0-9_]pluscharactersdefined

aslettersforthecurrentlocale.

\WMatchesthecomplementof\w.

\\Matchesaliteralbackslash.

Thismoduleexportsthefollowingfunctions:

matchMatcharegularexpressionpatterntothebeginningofastring.

searchSearchastringforthepresenceofapattern.

subSubstituteoccurrencesofapatternfoundinastring.

subnSameassub,butalsoreturnthenumberofsubstitutionsmade.

splitSplitastringbytheoccurrencesofapattern.

findallFindalloccurrencesofapatterninastring.

finditerReturnaniteratoryieldingamatchobjectforeachmatch.

compileCompileapatternintoaRegexObject.

purgeCleartheregularexpressioncache.

escapeBackslashallnon-alphanumericsinastring.

Someofthefunctionsinthismoduletakesflagsasoptionalparameters:

IIGNORECASEPerformcase-insensitivematching.

LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.

MMULTILINE"^"matchesthebeginningoflines（afteranewline）

aswellasthestring.

"$"matchestheendoflines（beforeanewline）aswell

astheendofthestring.

SDOTALL"."matchesanycharacteratall,includingthenewline.

XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE's.

UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale.

Thismodulealsodefinesanexception'error'.

上面说了基本语法和一些函数的使用。

基本语法在上面链接已经说明。

下面介绍主要函数的使用。

re的函数说明

match

查看帮助：

help（re.match）

Helponfunctionmatchinmodulere:

match（pattern,string,flags=0）

Trytoapplythepatternatthestartofthestring,returningamatchobject,orNoneifnomatchwasfound.

re.match（pattern,string,flags=0）

功能：

从字符串string第一个位置开始匹配，根据建立的pattern规则匹配，返回匹配规则的的字符串。

如果没有匹配成功返回：

None.flags是可选参数，用于控制正则表达式的匹配方式。

例子：

importre

pattern='[w]{3}.[a-z]+.（com）'

str1=""

str2="http:

re1=re.match（pattern,str1）

printre1.group（0）

re2=re.match（pattern,str2）

printre2.group（0）

匹配开始位置是的网址，第一个输出,第二个竟然报错了，因为第一个不匹配，但是说明文档说的是返回None的。

查看帮助：

help（re.search）

Helponfunctionsearchinmodulere:

search（pattern,string,flags=0）

Scanthroughstringlookingforamatchtothepattern,returning

amatchobject,orNoneifnomatchwasfound.

re.search（pattern,string,flags=0）

功能：

在字符串string中找到一个满足pattern匹配模式的字符串，不存在的返回None

例子：

importre

pattern='[w]{3}\.[a-z]+\.（com）'

str1=""

str2="http:

re1=re.search（pattern,str1）

printre1.group（）

re2=re.search（pattern,str2）

printre2.group（）

第一个输出：

第二个:

报错，匹配失败

sub

查看帮助：

help（re.sub）

Helponfunctionsubinmodulere:

sub（pattern,repl,string,count=0,flags=0）

Returnthestringobtainedbyreplacingtheleftmost

non-overlappingoccurrencesofthepatterninstringbythe

replacementrepl.replcanbeeitherastringoracallable;

ifastring,backslashescapesinitareprocessed.Ifitis

acallable,it'spassedthematchobjectandmustreturn

areplacementstringtobeused.

re.sub（pattern,repl,string,count=0,flags=0）

功能：

将字符串string满足pattern规则的字符串替换成repl，count默认是0全部替换，若是2是指只替换前两个。

例子：

importre

pattern='[w]{3}\.[a-z]+\.（com）'

repl=''

str3="ilove,tomlove"

re3=re.sub（pattern,repl,str3,1）

printre3

输出:

ilove,tomlove

subn

与re.sub差不多只是在返回时候还返回替换字符的个数

例子：

importre

pattern='[w]{3}\.[a-z]+\.（com）'

repl=''

str3="ilove,tomlove"

re3=re.subn（pattern,repl,str3,2）

printre3

输出：

（‘ilove,tomlove’,2）

split

查看帮助：

help（re.split）

Helponfunctionsplitinmodulere:

split（pattern,string,maxsplit=0,flags=0）

Splitthesourcestringbytheoccurrencesofthepattern,

returningalistcontainingtheresultingsubstrings.

re.split（pattern,string,maxsplit=0,flags=0）

功能：

根据pattern规则把字符串string分离，保存在list中。

maxsplit是最大分类个数，默认最大。

例子：

importre

str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"

pattern=","

printre.split（pattern,str）

输出结果：

[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]

findall

查看帮助：

help（re.findall）

Helponfunctionfindallinmodulere:

findall（pattern,string,flags=0）

Returnalistofallnon-overlappingmatchesinthestring.

Ifoneormoregroupsarepresentinthepattern,returna

listofgroups;thiswillbealistoftuplesifthepattern

hasmorethanonegroup.

Emptymatchesareincludedintheresult.

re.findall（pattern,string,flags=0）

功能：

在字符串string中找出所有满足正则的字符串，并存在列表list中，没有列表为空

例子：

importre

str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"

pattern="\w+"

printre.findall（pattern,str）

结果和上面的一样但是理解一样不一样的：

[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]

finditer

和findall类似，在字符串中找到正则表达式所匹配的所有子串，并组成一个迭代器返回

例子：

importre

str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"

pattern="\w+"

re4=re.finditer（pattern,str）

foriinre4:

printi.group（）

迭代器，通过for循环输出

foriinre4:

...printi.group（）

...

xiaoming

xiaohua

xiaoli

xiaoqiang

xiaozhang

compile

查看帮助：

help（pile）

Helponfunctioncompileinmodulere:

compile（pattern,flags=0）

Compilearegularexpressionpattern,returningapatternobject.

pile（pattern,flags=0）

功能：

把正则表达式pattern转化成正则表达式对象

例子：

importre

str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"

pattern="\w+"

patternobj=pile（pattern）

re4=re.finditer（pattern,str）

foriinre4:

printi.group（）

结果和上一个一样，感觉就是转成对象，在进行其他操作。

purge

查看帮助：

help（re.purge）

Helponfunctionpurgeinmodulere:

purge（）

Cleartheregularexpressioncache

功能：

清除缓存的正则表达式

escape

查看帮助：

help（re.escape）

Helponfunctionescapeinmodulere:

escape（pattern）

Escapeallnon-alphanumericcharactersinpattern.

功能：

对字符串中的非字母数字进行转义，具体什么意思我就不知道了。

例子：

>>>pattern

'\\w+'

>>>re.escape（pattern）

'\\\\w\\+'

看，不一样了。

具体我真的不懂了。

flags

IIGNORECASEPerformcase-insensitivematching.

LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.

MMULTILINE"^"matchesthebeginningoflines（afteranewline）

aswellasthestring.

"$"matchestheendoflines（beforeanewline）aswell

astheendofthestring.

SDOTALL"."matchesanycharacteratall,includingthenewline.

XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE's.

UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale.

展开阅读全文