Greta 正则表达式.docx
《Greta 正则表达式.docx》由会员分享,可在线阅读,更多相关《Greta 正则表达式.docx(34页珍藏版)》请在冰豆网上搜索。
Greta正则表达式
GRETA:
TheGRETARegularExpressionTemplateArchive
Copyright EricNiebler,2002
ThepurposeofthisdocumentistodescribehowtousetheGRETARegularExpressionTemplateArchive. Itdescribestheobjectsinthelibrary,themethodsdefinedontheobjects,andthewaystousetheobjectsandmethodstoperformregularexpressionpatternmatchingonstringsinC++. Itdoesnotdescriberegularexpressionsyntax. ItisenoughtosaythatthefullPerl5syntaxissupported. IfyouarenotfamiliarwithPerl’sregularexpressionsyntax,IrecommendreadingChapter2of ProgrammingPerl,2nd Ed. (a.k.a.TheCamelBook),oneofthemanyfinebooksputoutbyO’Reillypublishers.
GRETA:
TheGRETARegularExpressionTemplateArchive
Overview
AWordaboutSpeed
NoticetoUsersofVersion1.x
TherpatternObject
rpattern:
:
string_type
rpattern:
:
rpattern
rpattern:
:
match
rpattern:
:
substitute
rpattern:
:
count
rpattern:
:
split
rpattern:
:
set_substitution
rpattern:
:
cgroups
match_results,subst_resultsandsplit_results
match_results:
:
cbackrefs
match_results:
:
backref
match_results:
:
rstart
match_results:
:
rlength
match_results:
:
all_backrefs
subst_results:
:
backref_str
split_results:
:
strings
TheSyntaxModule
register_intrinsic_charset
CustomizingYourSearch
NOCASE
GLOBAL
MULTILINE
SINGLELINE
EXTENDED
RIGHTMOST
NOBACKREFS
ALLBACKREFS
FIRSTBACKREFS
NORMALIZE
MatchingModes
MODE_FAST
MODE_SAFE
MODE_MIXED
KnownIssuesandPerlIncompatibilities
EmbeddedCodeinaRegularExpression
PatternModifierScope
CommentBlocksBeforeQuantifiers
VariableWidthLook-BehindAssertions
RecursivePatterns
Compile-TimeSwitches
REGEX_WIDE_AND_NARROW
REGEX_POSIX
REGEX_NO_PERL
REGEX_DEBUG
REGEX_DEBUG_HEAP
REGEX_STACK_ALIGNMENT
REGEX_FOLD_INSTANTIATIONS
REGEX_TO_INSTANTIATE
Miscellaneous
StaticConstPatterns
Thread-safety
StackUsage
DBCS
STL
VC7andManagedCode
TemplateInstantiation
ContactInformation
Appendix1:
History
Appendix2:
ImplementationDetails
Overview
TheregularexpressiontemplatelibrarycontainsobjectsandfunctionsthatmakeitpossibletoperformpatternmatchingandsubstitutiononstringsinC++. Theyare:
∙rpattern:
thepatterntouseduringthesearch.
∙match_results/subst_results:
containerfortheresultsofamatch/substitution.
Toperformasearchorreplaceoperation,youwilltypicallyfirstinitializean rpattern objectbygivingitastringrepresentingthepatternagainstwhichtomatch. Youwillthencallamethodontherpatternobject(match() or substitute(),forinstance),passingitastringtomatchagainstanda match_results objectstoreceivetheresultsofthematch. Ifthe match()/substitute() fails,themethodreturnsfalse. Ifitsucceeds,itreturnstrue,andthe match_results objectstorestheresultingarrayof backreferences internally. (Here,theterm backreference hasthesamemeaningasitdoesinPerl. Backreferencesprovideextrainformationaboutwhatpartsofthepatternmatchedwhichpartsofthestring.) Therearemethodsonthe match_results objecttomakethebackreferenceinformationavailable. Forexample:
#include
#include
#include“regexpr2.h”
usingnamespacestd;
usingnamespaceregex;
intmain(){
match_resultsresults;
stringstr(“Thebookcost$12.34”);
rpatternpat(“\\$(\\d+)(\\.(\\d\\d))?
”);
//Matchadollarsignfollowedbyoneormoredigits,
//optionallyfollowedbyaperiodandtwomoredigits.
//Thedouble-escapesarenecessarytosatisfythecompiler.
match_results:
:
backref_typebr=pat.match(str,results);
if(br.matched){
cout<<“matchsuccess!
”< cout<<“price:
”<
}else{
cout<<“matchfailed!
”< }
return0;
}
Theaboveprogramwouldprintoutthefollowing:
matchsuccess!
price:
$12.34
Thefollowingsectionsdiscussthe rpattern objectindetailandhowtocustomizeyoursearchestobefasterandmoreefficient.
Note:
alldeclarationsintheheaderfile(regexpr2.h)arecontainedinthe regex namespace. Touseanyoftheobjects,methodsorenumerationsdescribedinthisdocument,youmustprependalldeclarationswith“regex:
:
”oryoumusthavethe“usingnamespaceregex;”directivesomewherewithintheenclosingscopeofyourdeclarations. Forsimplicity,I’veleftoffthe“regex:
:
”prefixesintherestofmycodesnippets.
AWordaboutSpeed
Differentregexenginesaregoodondifferenttypesofpatterns. Thatsaid,Ihavefoundmyregexenginetobeprettyquick. Forabenchmark,Imatchedthepattern“^([0-9]+)(\-||$)(.*)$”againstthestring“100-thisisalineofftpresponsewhichcontainsamessagestring”. GRETAisabout7timesfasterthantheregexlibraryinboost(http:
//www.boost.org),andabout10timesfasterthantheregularexpressionclassesinATL7. Forthisinput,GRETAisevenfasterthanPerl,althoughPerlisfasterforsomeotherpatterns. MostregexenginesIhaveseenbuildupanNFA (non-deterministicfinitestateautomaton) andexecuteititeratively,oftenwithabig,slowswitchstatement. Ihaveadifferentapproach:
patternsarecompiledintoadirected,possiblycyclicgraph,andmatchinghappensbytraversingthisgraphrecursively. Inaddition,thecodemakesheavyuseoftemplatestofreezethestateoftheflagsintothecompiledpatternsothattheydon’tneedtobecheckedatmatchtime. Theresultisaprettyleanblobofcodethatcanmatchyourpatternquickly.
Eventhebestalgorithmshavetheirweaknesses,though. MatchingregularexpressionswithbackreferencesisanNP-completeproblem. Therearepatternsthatwillmakeanybacktrackingregexenginetakeexponentialtimetofinish. (Theseusuallyinvolvenestedquantifiers.) Ifyouhaveaperformancecriticalapp,youwouldbesmarttotestyourpatternsforspeed,orprofileyourapptomakesureyouarenotspendingtoomuchtimethrashingaroundintheregexcode. You’vebeenwarned!
Also,seethesection VC7andManagedCode forsomeadviceforcompilingGRETAunderVC7.
NoticetoUsersofVersion1.x
Manythingshavechangedsinceversion1.xoftheRegularExpressionTemplateArchive. Ifyouhavecodewhichusesversion1.x,youwillnotbeabletouseversion2withoutmakingchangestoyourcode. Sorry!
Therewereanumberofunsafe,unintuitiveinterfacefeaturesofversion1thatIfeltwereworthfixingforversion2. Ifyouneedversion1,IhaveacopyandI’dbehappytogiveittoyou.
Mostnotably,the regexpr objecthasgoneaway. Itwasasubclassof std:
:
string,with match() and substitute() methods,anditstoredtheresultsofthematch/substituteinternally. Subclassing std:
:
string isdangerousbecause std:
:
string doesn’thaveavirtualdestructor. Also,matchingisconceptuallyaconstoperation,anditseemedwrongthatitshouldchangeinternalstate.
The match/count/substitute methodshavemovedtothe rpattern object. Thestatethatusedtobestoredinthe regexpr objectisnowputinamatch_results/subst_results container,whichispassedasanoutparametertothe match/substitute methods.
Also,the CSTRINGS flaghasgoneaway. ItisnolongernecessarytooptimizeapatternforusewithC-styleNULL-terminatedstrings. WhenyoupassaC-stylestringtothe rpattern:
:
matchmethod,thesameoptimizationisusedautomatically. (Inearly2.Xversionsofthelibrary,therewasa basic_rpattern_c objectforperformingthisoptimization,butitisnolongernecessaryandhasbeendeprecated.)
Anotherminorchangeinvolvesthe register_intrinsic_charset() method. Itusedtobeapartof rpattern’sinterface,butithasmovedtothesyntaxmodule.
Despitethesweepinginterfacechanges,themajorityoftheback-endcodeisunchanged. Youshouldexpectpatternsthatworkedinversion1.xtocontinuetoworkinversion2.
TherpatternObject
The rpattern objectcontainstheregularexpressionpatternagainstwhichtomatch. Italsoexposesthe match(), substitute(),and count() methodsyouwillusetoperformregularexpressionmatches. Whenyouinstantiatean rpattern object,thepatternis“compiled”intoastructurethatspeedsuppatternmatching. Oncecompiled,youmayreusethesamepatternformultiplematchoperations.
Hereishow rpattern isdeclared:
template typenameSY=perl_syntax:
iterator_traits:
:
value_type>>
classbasic_rpattern{
…
};
typedefbasic_rpattern:
basic_string:
:
const_iterator>rpattern;
typedefbasic_rpatternrpattern_c;
The rpattern classisatemplateoniteratortype. Itisalsoatemplateonthesyntaxmodule. Bydefault,thePerlsyntaxmoduleisused,butyouarefreetowriteyourownsyntaxandspecifyitasatemplateparameter. Seethesectiononthe SyntaxModule.
Thefollowingsectionsdescribethemethodsavailableonthe rpattern object.
rpattern:
:
string_type
rpattern:
:
string_type isatypedefthatisusedinmanyofthefollowingfunctionprototypes. Itisdefinedasfollows:
typedefCIconst_iterator;
typedefstd:
:
iterator_traits:
:
value_typechar_type;
typedefstd:
:
basic_stringstring_type;
Thetypedefisalittlecomplicated,butitseffectiswhatyouwouldexpect. Iftheresultofdereferencinga const_iterator isa char,then string_type isthesameas std:
:
string. Ifdereferencinga const_iterator resultsina wchar_t,then string_type isthesameas std:
:
wstring.
rpattern:
:
rpattern
Therearetwoconstructorsforinstantiatingan rpattern object. Herearetheirprototypes:
rpattern:
:
rpattern(
const string_type &pat,
REGEX_FLAGSflags=NOFLAGS,
REGEX_MODEmode=MODE_DEFAULT); //throw(bad_alloc,bad