Improve the compilation pipeline (introduce stages)
See original GitHub issueFor now, we have somewhat messed up pipeline: the IR nodes do all by themselves. Type checking, member resolve, lowering, codegen: we have everything everywhere. Right now, only parsing is properly isolated from everything else.
We should introduce more formal stages of compilation and maybe even create several different layers of IR? For example, we have certain constructs that have to be lowered (such as +=
-styled operators). We may get rid of these constructs in the “lowered IR layer” and thus decrease the amount of type checks required in the codegen.
I don’t know how should it be organized, though. Just start from separating the IR layers, and everything else will click and fit in place?
Thoughts?
When implementing, look for number 201
in the source and try to eliminate every instance of that number.
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Your example demonstrates that we should have a way to preserve the source information after preprocessing, and I very much agree with that.
I still very strongly disagree with the idea of changing the grammar. We have a lot of hacks already, and it is very hard to reason about the parser: how does it correspond to the C standard? Does it have any parser conflicts? What we should change to migrate to C23 after it emerges?
And these questions will quickly become impossible to answer if we change the parser completely, inventing some unholy hybrid of C grammar and C preprocessor grammar. Moreover, I cannot see how it helps to achieve anything: this combined grammar won’t have any source information embedded, either.
I believe that the preprocessor should generate some kind of annotated result (so we know from where each token comes). If this was (part of) your point, then I agree. Simple plain text output from the preprocessor won’t work, and I agree on that, too.
Unfortunately, Yoakke isn’t able to work with such token streams out-of-the-box, I believe. We may choose either to migrate to some other library (or a manual parser), or provide a bridge between the preprocessor-generated text and the source. One possible way to preserve the source information without changing Yoakke I imagine is the following:
(string, Dictionary<TextRange, SourceInformation>)
. Of course, there may be some peculiarities involved when we are trying to determine “the origin” of a C token created by the preprocessor (since the token may be glued together by several different macros), but that’s in any case a question for our interpretation.Dictionary<TextRange, SourceInformation>
, because at that point, both text ranges (from the dictionary and from the C parser) operate on the same source: the text clob from the preprocessor output.To me, it’s not a big deal whether we do it in this repo or contribute to Yoakke. The latter is a bit more complicated because we’ll need to invent abstractions useful for other people as well for ourselves. But still, doable.