贪婪 vs. 不情愿 vs. 占有资格 - Greedy vs. Reluctant vs. Possessive Qualifiers

2021年11月20日 阅读数:2
这篇文章主要向大家介绍贪婪 vs. 不情愿 vs. 占有资格 - Greedy vs. Reluctant vs. Possessive Qualifiers,主要内容包括基础应用、实用技巧、原理机制等方面,希望对大家有所帮助。

问题:

I found this tutorial on regular expressions and while I intuitively understand what "greedy", "reluctant" and "possessive" qualifiers do, there seems to be a serious hole in my understanding.我找到了这个关于正则表达式的教程,虽然我直观地理解了“贪婪”、“不情愿”和“占有”限定符的做用,但个人理解彷佛存在严重漏洞。html

Specifically, in the following example:具体来讲,在如下示例中:java

Enter your regex: .*foo // Greedy qualifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // Reluctant qualifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // Possessive qualifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The explanation mentions eating the entire input string, letters been consumed , matcher backing off , rightmost occurrence of "foo" has been regurgitated , etc.解释提到吃掉整个输入字符串、字母被消耗、匹配器退出、最右边出现的“foo”已被反刍等。git

Unfortunately, despite the nice metaphors, I still don't understand what is eaten by whom... Do you know of another tutorial that explains (concisely) how regular expression engines work?不幸的是,尽管有很好的比喻,我仍然不明白什么被谁吃掉了......你知道另外一个教程(简洁地)解释正则表达式引擎的工做原理吗?正则表达式

Alternatively, if someone can explain in somewhat different phrasing the following paragraph, that would be much appreciated:或者,若是有人能够用稍微不一样的措辞解释如下段落,那将不胜感激:express

The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f" , "o" , "o" .第一个示例使用贪婪量词.*查找“任何东西”,零次或屡次,后跟字母"f""o""o" Because the quantifier is greedy, the .* portion of the expression first eats the entire input string.由于量词是贪婪的,因此表达式的.*部分首先吃掉整个输入字符串。 At this point, the overall expression cannot succeed, because the last three letters ( "f" , "o" , "o" ) have already been consumed [by whom?].此时,总体表达式没法成功,由于最后三个字母( "f" , "o" , "o" )已经被[被谁消耗了?]。 So the matcher slowly backs off [from right-to-left?] one letter at a time until the rightmost occurrence of "foo" has been regurgitated [what does this mean?], at which point the match succeeds and the search ends.因此匹配器慢慢地回退[从右到左?]一个字母,直到最右边出现的"foo"被反刍[这是什么意思?],此时匹配成功,搜索结束。oracle

The second example, however, is reluctant, so it starts by first consuming [by whom?] "nothing".然而,第二个例子是不情愿的,因此它首先消耗[由谁?]“无”。 Because "foo" doesn't appear at the beginning of the string, it's forced to swallow [who swallows?] the first letter (an "x" ), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted.由于"foo"没有出如今字符串的开头,因此它被迫吞下 [谁吞了?] 第一个字母(一个"x" ),这会在 0 和 4 处触发第一次匹配。咱们的测试工具继续这个过程直到输入字符串用完。 It finds another match at 4 and 13.它在 4 和 13 处找到另外一个匹配项。app

The third example fails to find a match because the quantifier is possessive.第三个示例找不到匹配项,由于量词是全部格。 In this case, the entire input string is consumed by .*+ [how?], leaving nothing left over to satisfy the "foo" at the end of the expression.在这种状况下,整个输入字符串被.*+ [how?] 消耗,没有留下任何东西来知足表达式末尾的“foo”。 Use a possessive quantifier for situations where you want to seize all of something without ever backing off [what does back off mean?];在你想抓住全部东西而不退缩的状况下使用全部格量词[退避是什么意思?]; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.在没有当即找到匹配的状况下,它将优于等效的贪婪量词。ide


解决方案:

参考: https://stackoom.com/en/question/MJvs