Python正则表达式re模块.docx
《Python正则表达式re模块.docx》由会员分享,可在线阅读,更多相关《Python正则表达式re模块.docx(11页珍藏版)》请在冰豆网上搜索。
![Python正则表达式re模块.docx](https://file1.bdocx.com/fileroot1/2023-2/21/55e2407c-5257-45d7-a412-0120cf2b7181/55e2407c-5257-45d7-a412-0120cf2b71811.gif)
Python正则表达式re模块
Python正则
基本说明
之前讲过关于Python正则的,都是理论的东西,现在讲讲Python正则re模块。
导入re模块:
importre
查看帮助文档:
printre._doc_
下面就是输出的帮助文档:
Supportforregularexpressions(RE).
Thismoduleprovidesregularexpressionmatchingoperationssimilarto
thosefoundinPerl.Itsupportsboth8-bitandUnicodestrings;both
thepatternandthestringsbeingprocessedcancontainnullbytesand
charactersoutsidetheUSASCIIrange.
Regularexpressionscancontainbothspecialandordinarycharacters.
Mostordinarycharacters,like"A","a",or"0",arethesimplest
regularexpressions;theysimplymatchthemselves.Youcan
concatenateordinarycharacters,solastmatchesthestring'last'.
Thespecialcharactersare:
"."Matchesanycharacterexceptanewline.
"^"Matchesthestartofthestring.
"$"Matchestheendofthestringorjustbeforethenewlineat
theendofthestring.
"*"Matches0ormore(greedy)repetitionsoftheprecedingRE.
Greedymeansthatitwillmatchasmanyrepetitionsaspossible.
"+"Matches1ormore(greedy)repetitionsoftheprecedingRE.
"?
"Matches0or1(greedy)oftheprecedingRE.
*?
+?
?
?
Non-greedyversionsofthepreviousthreespecialcharacters.
{m,n}MatchesfrommtonrepetitionsoftheprecedingRE.
{m,n}?
Non-greedyversionoftheabove.
"\\"Eitherescapesspecialcharactersorsignalsaspecialsequence.//FROMTHISWEBSITE:
[]Indicatesasetofcharacters.
A"^"asthefirstcharacterindicatesacomplementingset.
"|"A|B,createsanREthatwillmatcheitherAorB.
(...)MatchestheREinsidetheparentheses.
Thecontentscanberetrievedormatchedlaterinthestring.
(?
iLmsux)SettheI,L,M,S,U,orXflagfortheRE(seebelow).
(?
:
...)Non-groupingversionofregularparentheses.
(?
P...)Thesubstringmatchedbythegroupisaccessiblebyname.
(?
P=name)Matchesthetextmatchedearlierbythegroupnamedname.
(?
#...)Acomment;ignored.
(?
=...)Matchesif...matchesnext,butdoesn'tconsumethestring.
(?
!
...)Matchesif...doesn'tmatchnext.
(?
<=...)Matchesifprecededby...(mustbefixedlength).
(?
...)Matchesifnotprecededby...(mustbefixedlength).
(?
(id/name)yes|no)Matchesyespatternifthegroupwithid/namematched,
the(optional)nopatternotherwise.
Thespecialsequencesconsistof"\\"andacharacterfromthelist
below.Iftheordinarycharacterisnotonthelist,thenthe
resultingREwillmatchthesecondcharacter.
\numberMatchesthecontentsofthegroupofthesamenumber.
\AMatchesonlyatthestartofthestring.
\ZMatchesonlyattheendofthestring.
\bMatchestheemptystring,butonlyatthestartorendofaword.
\BMatchestheemptystring,butnotatthestartorendofaword.
\dMatchesanydecimaldigit;equivalenttotheset[0-9].
\DMatchesanynon-digitcharacter;equivalenttotheset[^0-9].
\sMatchesanywhitespacecharacter;equivalentto[\t\n\r\f\v].
\SMatchesanynon-whitespacecharacter;equiv.to[^\t\n\r\f\v].
\wMatchesanyalphanumericcharacter;equivalentto[a-zA-Z0-9_].
WithLOCALE,itwillmatchtheset[0-9_]pluscharactersdefined
aslettersforthecurrentlocale.
\WMatchesthecomplementof\w.
\\Matchesaliteralbackslash.
Thismoduleexportsthefollowingfunctions:
matchMatcharegularexpressionpatterntothebeginningofastring.
searchSearchastringforthepresenceofapattern.
subSubstituteoccurrencesofapatternfoundinastring.
subnSameassub,butalsoreturnthenumberofsubstitutionsmade.
splitSplitastringbytheoccurrencesofapattern.
findallFindalloccurrencesofapatterninastring.
finditerReturnaniteratoryieldingamatchobjectforeachmatch.
compileCompileapatternintoaRegexObject.
purgeCleartheregularexpressioncache.
escapeBackslashallnon-alphanumericsinastring.
Someofthefunctionsinthismoduletakesflagsasoptionalparameters:
IIGNORECASEPerformcase-insensitivematching.
LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.
MMULTILINE"^"matchesthebeginningoflines(afteranewline)
aswellasthestring.
"$"matchestheendoflines(beforeanewline)aswell
astheendofthestring.
SDOTALL"."matchesanycharacteratall,includingthenewline.
XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE's.
UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale.
Thismodulealsodefinesanexception'error'.
上面说了基本语法和一些函数的使用。
基本语法在上面链接已经说明。
下面介绍主要函数的使用。
re的函数说明
match
查看帮助:
help(re.match)
Helponfunctionmatchinmodulere:
match(pattern,string,flags=0)
Trytoapplythepatternatthestartofthestring,returningamatchobject,orNoneifnomatchwasfound.
re.match(pattern,string,flags=0)
功能:
从字符串string第一个位置开始匹配,根据建立的pattern规则匹配,返回匹配规则的的字符串。
如果没有匹配成功返回:
None.flags是可选参数,用于控制正则表达式的匹配方式。
例子:
importre
pattern='[w]{3}.[a-z]+.(com)'
str1=""
str2="http:
"
re1=re.match(pattern,str1)
printre1.group(0)
re2=re.match(pattern,str2)
printre2.group(0)
匹配开始位置是的网址,第一个输出,第二个竟然报错了,因为第一个不匹配,但是说明文档说的是返回None的。
search
查看帮助:
help(re.search)
Helponfunctionsearchinmodulere:
search(pattern,string,flags=0)
Scanthroughstringlookingforamatchtothepattern,returning
amatchobject,orNoneifnomatchwasfound.
re.search(pattern,string,flags=0)
功能:
在字符串string中找到一个满足pattern匹配模式的字符串,不存在的返回None
例子:
importre
pattern='[w]{3}\.[a-z]+\.(com)'
str1=""
str2="http:
"
re1=re.search(pattern,str1)
printre1.group()
re2=re.search(pattern,str2)
printre2.group()
第一个输出:
第二个:
报错,匹配失败
sub
查看帮助:
help(re.sub)
Helponfunctionsubinmodulere:
sub(pattern,repl,string,count=0,flags=0)
Returnthestringobtainedbyreplacingtheleftmost
non-overlappingoccurrencesofthepatterninstringbythe
replacementrepl.replcanbeeitherastringoracallable;
ifastring,backslashescapesinitareprocessed.Ifitis
acallable,it'spassedthematchobjectandmustreturn
areplacementstringtobeused.
re.sub(pattern,repl,string,count=0,flags=0)
功能:
将字符串string满足pattern规则的字符串替换成repl,count默认是0全部替换,若是2是指只替换前两个。
例子:
importre
pattern='[w]{3}\.[a-z]+\.(com)'
repl=''
str3="ilove,tomlove"
re3=re.sub(pattern,repl,str3,1)
printre3
输出:
ilove,tomlove
subn
与re.sub差不多只是在返回时候还返回替换字符的个数
例子:
importre
pattern='[w]{3}\.[a-z]+\.(com)'
repl=''
str3="ilove,tomlove"
re3=re.subn(pattern,repl,str3,2)
printre3
输出:
(‘ilove,tomlove’,2)
split
查看帮助:
help(re.split)
Helponfunctionsplitinmodulere:
split(pattern,string,maxsplit=0,flags=0)
Splitthesourcestringbytheoccurrencesofthepattern,
returningalistcontainingtheresultingsubstrings.
re.split(pattern,string,maxsplit=0,flags=0)
功能:
根据pattern规则把字符串string分离,保存在list中。
maxsplit是最大分类个数,默认最大。
例子:
importre
str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"
pattern=","
printre.split(pattern,str)
输出结果:
[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]
findall
查看帮助:
help(re.findall)
Helponfunctionfindallinmodulere:
findall(pattern,string,flags=0)
Returnalistofallnon-overlappingmatchesinthestring.
Ifoneormoregroupsarepresentinthepattern,returna
listofgroups;thiswillbealistoftuplesifthepattern
hasmorethanonegroup.
Emptymatchesareincludedintheresult.
re.findall(pattern,string,flags=0)
功能:
在字符串string中找出所有满足正则的字符串,并存在列表list中,没有列表为空
例子:
importre
str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"
pattern="\w+"
printre.findall(pattern,str)
结果和上面的一样但是理解一样不一样的:
[‘xiaoming’,‘xiaohua’,‘xiaoli’,‘xiaoqiang’,‘xiaozhang’]
finditer
和findall类似,在字符串中找到正则表达式所匹配的所有子串,并组成一个迭代器返回
例子:
importre
str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"
pattern="\w+"
re4=re.finditer(pattern,str)
foriinre4:
printi.group()
迭代器,通过for循环输出
foriinre4:
...printi.group()
...
xiaoming
xiaohua
xiaoli
xiaoqiang
xiaozhang
compile
查看帮助:
help(pile)
Helponfunctioncompileinmodulere:
compile(pattern,flags=0)
Compilearegularexpressionpattern,returningapatternobject.
pile(pattern,flags=0)
功能:
把正则表达式pattern转化成正则表达式对象
例子:
importre
str="xiaoming,xiaohua,xiaoli,xiaoqiang,xiaozhang"
pattern="\w+"
patternobj=pile(pattern)
re4=re.finditer(pattern,str)
foriinre4:
printi.group()
结果和上一个一样,感觉就是转成对象,在进行其他操作。
purge
查看帮助:
help(re.purge)
Helponfunctionpurgeinmodulere:
purge()
Cleartheregularexpressioncache
功能:
清除缓存的正则表达式
escape
查看帮助:
help(re.escape)
Helponfunctionescapeinmodulere:
escape(pattern)
Escapeallnon-alphanumericcharactersinpattern.
功能:
对字符串中的非字母数字进行转义,具体什么意思我就不知道了。
例子:
>>>pattern
'\\w+'
>>>re.escape(pattern)
'\\\\w\\+'
看,不一样了。
具体我真的不懂了。
flags
IIGNORECASEPerformcase-insensitivematching.
LLOCALEMake\w,\W,\b,\B,dependentonthecurrentlocale.
MMULTILINE"^"matchesthebeginningoflines(afteranewline)
aswellasthestring.
"$"matchestheendoflines(beforeanewline)aswell
astheendofthestring.
SDOTALL"."matchesanycharacteratall,includingthenewline.
XVERBOSEIgnorewhitespaceandcommentsfornicerlookingRE's.
UUNICODEMake\w,\W,\b,\B,dependentontheUnicodelocale.