[question] `canMatchZeroChars` in cycles
See original GitHub issueHello,
while reimplementing pikaparser for another language (julia), I noticed a possible small discrepancy when initializing canMatchZeroChars
values for clause objects in grammar cycles. Imagine a grammar with a simple cycle and a single terminal t
such as this:
Start ← A
A ← B?
B ← C A
C ← t?
Here, in topological order, C can match zero chars, B should be able to match zero chars because both A and C can match zero chars, and A (resp. Start) can trivially (resp. transitively) match zero chars.
Despite of that, the topological-ordering initialization seems to initialize B with canMatchZeroChars=false
, because the value at A is initially false.
Semantically this is probably not a very big deal, but I wonder if this could have an effect on grammar matching, such as in case there would be a rule:
D ← B t
…which (I guess) would at this point not be able to match with a zero-char B.
I understand that my above grammar is technically disallowed with B having canMatchZeroChars=true
, because B?
is equivalent to B|ε
which would be erroneous if B can match zero chars. Is there a proof that one really can not construct a valid grammar that would trigger this possible discrepancy with initializaiton of canMatchZeroChars
? I didn’t find any counterexample, but that ofc doesn’t prove much :]
Thank you very much for any help!
Issue Analytics
- State:
- Created a year ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
You’re right. In the worst case the performance will be
O(C^2)
in the number of clauses. I’m OK with that.OK, I committed a fix for this. It seems to work OK but I haven’t tested it extensively. Feel free to try the fix and let me know if it works for you too. Thanks again for your astute observation that led to finding this problem, @exaexa!