sg7: Re: [SG7] Metaprogramming

From: David Rector <davrec_at_[hidden]>
Date: Sun, 25 Oct 2020 10:52:06 -0400

> On Oct 24, 2020, at 9:55 PM, Richard Smith <richardsmith_at_[hidden]> wrote:
>
> On Sat, Oct 24, 2020 at 11:11 AM David Rector <davrec_at_[hidden] <mailto:davrec_at_[hidden]>> wrote:
> Richard the git I linked to below works but the source is a bit of a mess; to give you the broad strokes, these are the big changes that would be needed to parse string literals during constant evaluation, in case this were to be seriously considered:
> The Parse and Sema libraries would need to be combined.
> That wouldn't be acceptable, but you could register a callback instead; that's what we do for MSVC-compatibility-mode's delayed template parsing.
> Some additional code added to the Lexer/Preprocessor (nothing in the hot path), similar to the functionality used to expand macros.
> Some additional logic needed to add "metaparsed" Decls into template instantiations (setting up scopes etc.).
> There may be issues when the user calls macros in the code strings (e.g. `__metaparse("int SOMEMACRO(foo) = 4")`) — I got crashes early on, added checks and diagnostics to address them, but they were needed in a number of strange spots so a more comprehensive consideration would be warranted.
> Would be nice if diagnostics could point inside a string literal, e.g.
> ```
> __metaparse("int i = missspelling;");
> ^ unknown identifier
> ```
>
> It’s been awhile but I think that covers it.
>
> I have no doubt that you can get simpler cases working this way. But the way we process source during parsing and during template instantiation are fundamentally different, so I'd expect lots of things to not work (for example, you won't be able to do lookups that find local variables when injecting into a function scope).
>

In that proof-of-concept implem, buggy though it is, lookups work as expected, including in template instantiations, including local variables therein.

Example (NB this was before consteval):

```
constexpr void printVar(const char *name) {
    __queue_metaparse("std::cout << ");
    const char *name_with_quotes = __concatenate("\"", name, "\"");
    __queue_metaparse(name_with_quotes);
    __queue_metaparse("\" = \" << ");
    __queue_metaparse(name);
    __queue_metaparse(";");
}

template<typename T>
void f(T t) {
    T localVarA = t;
    constexpr {
        printVar("t"); //ok
        printVar("localVarA"); //ok
        // printVar("localVarB"); //error
    }
    T localVarB;
}

int main() {
    int localVarA = 99999; //*not* referenced/printed
    f(2); // prints t = 2 / localVarA = 2
}
```

> But honestly, I don't think implementation complexity is the primary reason we shouldn't do this. I think the better reason is that such an approach is fundamentally distasteful and unhygienic in much the same way that macros are. You know what's worse than a vexing parse bug? A vexing parse bug caused by a compile-time program spitting out an ambiguous string of C++ code.

Assuming lookup can indeed work or be made to work as expected in all cases, this sort of functionality seems pretty straightforward to debug: just examine what strings are being sent out for metaparsing. (And re macros: is it their effect/output that is distasteful, or merely the fact they don’t take typed inputs and operate at compile time during parsing, as this does?)

> Good metaprogramming is semantic, not based on what a string happens to mean when interpreted in a particular context.

Is it? Programs are written in text. The user expresses their desired semantics via textual syntax. (It’s even a bit arbitrary, frankly; e.g. musical instructions aren’t written in text.) If you give users the ability to manipulate the textual input sent to the compiler, during constant evaluation in such a way they can depend on the already-compiled information of their program while doing so, there is certainly no conceivable metaprogramming task they cannot do. (Note here too that arbitrary recursion works as well: `constexpr { __queue_metaparse("constexpr { __queue_metaparse(\"int i = 42;\"); }"); }`).

The same cannot be said of fragment injection — because they don’t operate on text, they can never be said to step fully outside the program. I.e. there is almost certainly a tradeoff between power and safety. Fragments choose safety - but when asked to push the limits with heavy duty higher order metaprogramming, will they be found wanting?

On the other hand: might it be true that anything string injection can do, but fragment injection cannot, is necessarily unhygienic? That it can’t possibly be the best or only way to achieve something? Maybe. I could be convinced. But if there is any doubt, then perhaps C++ should err on the side of giving users too much rope, not too little. (Why should C++ leave any room for a higher level language either?)

I hope this at least has provided fodder for further discussion. Thank you Andrew and everyone else who has labored on this difficult set of features.

>> On Oct 24, 2020, at 1:35 PM, Ville Voutilainen <ville.voutilainen_at_[hidden] <mailto:ville.voutilainen_at_[hidden]>> wrote:
>>
>> On Sat, 24 Oct 2020 at 20:01, David Rector <davrec_at_[hidden] <mailto:davrec_at_[hidden]>> wrote:
>>>
>>>
>>>
>>> On Oct 24, 2020, at 12:34 PM, Ville Voutilainen <ville.voutilainen_at_[hidden] <mailto:ville.voutilainen_at_[hidden]>> wrote:
>>>
>>> On Sat, 24 Oct 2020 at 19:17, David Rector via SG7 <sg7_at_[hidden] <mailto:sg7_at_[hidden]>> wrote:
>>>
>>> # 3
>>> I think only strings should be injectible. Get rid of fragments — they are a source of needless complexity. IIUC, Andrew proposes the ability to inject arbitary code strings via `|# … #|` syntax. E.g. `consteval { |# "private: int i = 42;" #| }` would inject a private `i` into the enclosing class context (and result in a parse error if not in a class).
>>>
>>>
>>> That doesn't seem to be correct. The paper says that a |# ... #| is an
>>> identifier splice, not an arbitrary string.
>>>
>>> I am under the impression that building injected code with just
>>> strings or with just tokens is rather horrible
>>> for implementations? For syntax validation and semantic analysis, most
>>> likely. Token-soup-injection
>>> as a possibility sure made our clang friends balk in Toronto.
>>>
>>>
>>> What’s so horrible about the implementation? Here’s an old clang implementation, scroll down to the "metaparsing" part for examples:
>>>
>>> https://github.com/drec357/clang-meta <https://github.com/drec357/clang-meta>
>>
>> Perhaps Mr. Smith could illuminate us.

Received on 2020-10-25 09:52:11