1、Greta 正则表达式GRETA:The GRETA Regular Expression Template ArchiveCopyrightEric Niebler, 2002The purpose of this document is to describe how to use the GRETA Regular Expression Template Archive. It describes the objects in the library, the methods defined on the objects, and the ways to use the objects
2、and methods to perform regular expression pattern matching on strings in C+. It does not describe regular expression syntax. It is enough to say that the full Perl 5 syntax is supported. If you are not familiar with Perls regular expression syntax, I recommend reading Chapter 2 ofProgramming Perl, 2
3、ndEd.(a.k.a. The Camel Book), one of the many fine books put out by OReilly publishers.GRETA: The GRETA Regular Expression Template ArchiveOverviewA Word about SpeedNotice to Users of Version 1.xThe rpattern Objectrpattern:string_typerpattern:rpatternrpattern:matchrpattern:substituterpattern:countrp
4、attern:splitrpattern:set_substitutionrpattern:cgroupsmatch_results, subst_results and split_resultsmatch_results:cbackrefsmatch_results:backrefmatch_results:rstartmatch_results:rlengthmatch_results:all_backrefssubst_results:backref_strsplit_results:stringsThe Syntax Moduleregister_intrinsic_charsetC
5、ustomizing Your SearchNOCASEGLOBALMULTILINESINGLELINEEXTENDEDRIGHTMOSTNOBACKREFSALLBACKREFSFIRSTBACKREFSNORMALIZEMatching ModesMODE_FASTMODE_SAFEMODE_MIXEDKnown Issues and Perl IncompatibilitiesEmbedded Code in a Regular ExpressionPattern Modifier ScopeComment Blocks Before QuantifiersVariable Width
6、 Look-Behind AssertionsRecursive PatternsCompile-Time SwitchesREGEX_WIDE_AND_NARROWREGEX_POSIXREGEX_NO_PERLREGEX_DEBUGREGEX_DEBUG_HEAPREGEX_STACK_ALIGNMENTREGEX_FOLD_INSTANTIATIONSREGEX_TO_INSTANTIATEMiscellaneousStatic Const PatternsThread-safetyStack UsageDBCSSTLVC7 and Managed CodeTemplate Instan
7、tiationContact InformationAppendix 1: HistoryAppendix 2: Implementation DetailsOverviewThe regular expression template library contains objects and functions that make it possible to perform pattern matching and substitution on strings in C+. They are: rpattern: the pattern to use during the search.
8、 match_results/subst_results: container for the results of a match/substitution.To perform a search or replace operation, you will typically first initialize anrpatternobject by giving it a string representing the pattern against which to match. You will then call a method on the rpattern object (ma
9、tch()orsubstitute(), for instance), passing it a string to match against and amatch_resultsobjects to receive the results of the match. If thematch()/substitute()fails, the method returns false. If it succeeds, it returns true, and thematch_resultsobject stores the resulting array ofbackreferencesin
10、ternally. (Here, the termbackreferencehas the same meaning as it does in Perl. Backreferences provide extra information about what parts of the pattern matched which parts of the string.) There are methods on thematch_resultsobject to make the backreference information available. For example:#includ
11、e #include #include “regexpr2.h”using namespace std;using namespace regex;int main() match_results results; string str( “The book cost $12.34” ); rpattern pat( “$(d+)(.(dd)?” );/ Match a dollar sign followed by one or more digits,/ optionally followed by a period and two more digits./ The double-esc
12、apes are necessary to satisfy the compiler. match_results:backref_type br = pat.match( str, results ); if( br.matched ) cout “match success!” endl; cout “price: ” br endl; else cout “match failed!” endl; return 0;The above program would print out the following:match success!price: $12.34The followin
13、g sections discuss therpatternobject in detail and how to customize your searches to be faster and more efficient.Note: all declarations in the header file (regexpr2.h) are contained in theregexnamespace. To use any of the objects, methods or enumerations described in this document, you must prepend
14、 all declarations with “regex:” or you must have the “using namespace regex;” directive somewhere within the enclosing scope of your declarations. For simplicity, Ive left off the “regex:” prefixes in the rest of my code snippets.A Word about SpeedDifferent regex engines are good on different types
15、of patterns. That said, I have found my regex engine to be pretty quick. For a benchmark, I matched the pattern “(0-9+)(-| |$)(.*)$” against the string “100- this is a line of ftp response which contains a message string”. GRETA is about 7 times faster than the regex library in boost (http:/www.boos
16、t.org), and about 10 times faster than the regular expression classes in ATL7. For this input, GRETA is even faster than Perl, although Perl is faster for some other patterns. Most regex engines I have seen build up an NFA(non-deterministic finite state automaton)and execute it iteratively, often wi
17、th a big, slow switch statement. I have a different approach: patterns are compiled into a directed, possibly cyclic graph, and matching happens by traversing this graph recursively. In addition, the code makes heavy use of templates to freeze the state of the flags into the compiled pattern so that
18、 they dont need to be checked at match time. The result is a pretty lean blob of code that can match your pattern quickly.Even the best algorithms have their weaknesses, though. Matching regular expressions with backreferences is an NP-complete problem. There are patterns that will make any backtrac
19、king regex engine take exponential time to finish. (These usually involve nested quantifiers.) If you have a performance critical app, you would be smart to test your patterns for speed, or profile your app to make sure you are not spending too much time thrashing around in the regex code. Youve bee
20、n warned!Also, see the sectionVC7 and Managed Codefor some advice for compiling GRETA under VC7.Notice to Users of Version 1.xMany things have changed since version 1.x of the Regular Expression Template Archive. If you have code which uses version 1.x, you will not be able to use version 2 without
21、making changes to your code. Sorry! There were a number of unsafe, unintuitive interface features of version 1 that I felt were worth fixing for version 2. If you need version 1, I have a copy and Id be happy to give it to you.Most notably, theregexprobject has gone away. It was a subclass ofstd:str
22、ing, withmatch()andsubstitute()methods, and it stored the results of the match/substitute internally. Subclassingstd:stringis dangerous becausestd:stringdoesnt have a virtual destructor. Also, matching is conceptually a const operation, and it seemed wrong that it should change internal state.Themat
23、ch/count/substitutemethods have moved to therpatternobject. The state that used to be stored in theregexprobject is now put in amatch_results/subst_resultscontainer, which is passed as an out parameter to thematch/substitutemethods.Also, theCSTRINGSflag has gone away. It is no longer necessary to op
24、timize a pattern for use with C-style NULL-terminated strings. When you pass a C-style string to therpattern:matchmethod, the same optimization is used automatically. (In early 2.X versions of the library, there was abasic_rpattern_cobject for performing this optimization, but it is no longer necess
25、ary and has been deprecated.)Another minor change involves theregister_intrinsic_charset()method. It used to be a part ofrpatterns interface, but it has moved to the syntax module.Despite the sweeping interface changes, the majority of the back-end code is unchanged. You should expect patterns that
26、worked in version 1.x to continue to work in version 2.The rpattern ObjectTherpatternobject contains the regular expression pattern against which to match. It also exposes thematch(),substitute(), andcount()methods you will use to perform regular expression matches. When you instantiate anrpatternob
27、ject, the pattern is “compiled” into a structure that speeds up pattern matching. Once compiled, you may reuse the same pattern for multiple match operations.Here is howrpatternis declared:template typename CI, typename SY = perl_syntaxstd:iterator_traits:value_type class basic_rpattern ;typedef bas
28、ic_rpatternstd:basic_string:const_iterator rpattern;typedef basic_rpattern rpattern_c;Therpatternclass is a template on iterator type. It is also a template on the syntax module. By default, the Perl syntax module is used, but you are free to write your own syntax and specify it as a template parame
29、ter. See the section on theSyntax Module.The following sections describe the methods available on therpatternobject.rpattern:string_typerpattern:string_typeis a typedef that is used in many of the following function prototypes. It is defined as follows:typedef CI const_iterator;typedef std:iterator_
30、traits:value_type char_type;typedef std:basic_string string_type;The typedef is a little complicated, but its effect is what you would expect. If the result of dereferencing aconst_iteratoris achar, thenstring_typeis the same asstd:string. If dereferencing aconst_iteratorresults in awchar_t, thenstring_typeis the same asstd:wstring.rpattern:rpatternThere are two constructors for instantiating anrpatternobject. Here are their prototypes:rpattern:rpattern(conststring_type& pat,REGEX_FLAGS flags=NOFLAGS,REGEX_MODE mode=MODE_DEFAULT );/ throw(bad_alloc,bad
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1