-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite left-recursion when seed parent might be empty #35
Comments
After some tinkering, using my suggested fix breaks some of the unit tests, so that is not a proper solution for this problem. |
Hi @CptWesley, Thanks for finding and reporting this case. It has been a long time since I looked at this algorithm, but off the top of my head, you should be able to detect this case and not add the parent rule if the rule already matched the empty string, and the rule is its own parent. However, things get trickier when there is an indirect cycle that consumes zero characters for each loop around the cycle (i.e. if a rule is only its own indirect parent). I suspect the more general fix is to not move upwards to parent phrases if the match at a given start position is exactly the same length as a previous match in that position. Actually I thought that's what the logic already did (since the matching logic requires that matches get at least one character longer for each loop around a grammar cycle), so the implementation may just be buggy. Let me know if that helps at all... |
@lukehutch Thank you for your response. When fiddling with things I came to a similar solution as you proposed, but I wasn't sure if that did not create new degenerate cases. In the current reference implementation the requirement of matches growing in size before looping is not enforced. Inside the I had a bit of an issue understanding the intuition of simply always adding seed parents that might be empty to the queue. I believe it's to ensure that other matches that are of size 0 also get considered along the line, but I found it hard to think of any scenario where this would be necessary. Removing this second condition also still allows all unit tests to pass. |
Hi @CptWesley ! I think your grammar rule Reference code: import PikaParser as P
rules = Dict(
:A => P.first(P.token('a'), P.epsilon),
:chain => P.seq(:chain, :A),
)
g = P.make_grammar([:chain, :A], P.flatten(rules, Char))
input = "aaaa"
p = P.parse(g, input) Hope this helps! |
@exaexa The definition is only infinite for recursive decent parsering due to the left recursion. Which is the problem that this algorithm/paper tries to solve. Perhaps this specific case is not one of the cases that is supposed to be covered (it's been a while since I last looked into this). But this grammar works fine for the family of left-recursion growing parsers (https://dl.acm.org/doi/10.1145/1328408.1328424). Although the definition of "fine" might differ on who you'd ask, since from my understanding these tricks to support left-recursion are usually not fully defined and often not complete (https://www.jstage.jst.go.jp/article/ipsjjip/29/0/29_174/_article). |
My main question was basically about if you sure that you have the correct grammar there-- yours is basically:
There, the chain recursion does not allow any termination ever -- the rule Thus your grammar would be perfectly equivalent to just:
...which corresponds to what I'm getting (in my test I get 4 independent matches of The closest to your grammar that the paper reports is IMO this one:
Anyway, your grammar can be converted to a one where
...and there I'm actually getting the infinite recursion too. Time to debug! :] |
@exaexa You're right. I didn't properly inspect my grammar, I assumed it was written as the last snippet you posted. However, I would expect it should be able to detect this and terminate, rather than loop forever, so I think your Julia implementation does better in that regard 😅. |
haha yeah I put some time into making sure the "don't match self again" invariant holds, yet alas in this case it was obviously insufficient. I still think this is fixable though, the loop in the algorithm is detectable and is visibly invalid as it is trying to force new repeated matches of stuff that is already in the match table. The main logic problem is with the epsilon-shortcutting logic there, roughly:
I'll be back with updates. |
OK so the actual inconsistency in this grammar
seems to be as follows:
The few things we thus found so far:
rules = Dict(
:A => P.token('a'),
:X => P.seq(:chain),
:chain => P.first(P.seq(:X, :A), :A),
)
In perspective, I think this here might actually give a nice defining property of pikaparsers -- problematic left recursion is skipped on the basis of the no-ε-on-the-left rule, reflecting upon the overall nature of PEGs. |
Hi. While working on a C# implementation of this algorithm, I ran into an interesting case which caused the parser to run into an infinite loop. To verify that it wasn't just a minor error of mine I also attempted to recreate the same grammar in the reference implementation and ran into the same problem.
The specific case:
If I'm not mistaken, this runs into an infinite loop because the
CHAIN
rule may match an empty string, but is also its own (indirect) seed parent, causing it to always get added to the priority queue (which will therefore never be empty). This gives a debug trace of:Of course, this grammar can be easily rewritten in order to not be left-recursive, but from what I understand the algorithm is supposed to be able to handle these cases. I was wondering if you had any suggestions on how the implementation could be modified in order to be able to handle these cases? My initial thoughts are to simply check if we already attempted to match a clause at a specific position in the input and if so, do not add the clause to the priority queue. However, I'm not sure if this would have other unforeseen side effects on the workings of the algorithm.
The text was updated successfully, but these errors were encountered: