C++ Logo

SG7

Advanced search

Subject: Re: Metaprogramming
From: Herb Sutter (hsutter_at_[hidden])
Date: 2020-10-27 17:46:13


I thought string-based and token-based approaches were already proposed and considered, and SG7 did not favor them... is that right?

I have heard about experience with string-based approaches in D. The experience I heard from production users who were not in the language's core design team was that the general ability to generate more code at compile time was extremely useful and game-changing (which is something we all want in C++ but is not specific to providing it using strings in particular) but using strings to do it frequently led them to producing write-only code that couldn't be easily maintained and frequently had to be rewritten or discarded (in at least the experience of the people I heard from).

Also please remember IDEs and other tools: I would think it's easier to be able to syntax-highlight, debug / step-into, refactor, etc. a fragment that looks like actual C++ grammar (i.e., treat code as code, even if it contains placeholders which we already have today with templates), than a series of string/stream concatentations/insertions (which would feel like the weaker parts of treating code as data).

Consider further the analogy with templates: If we had a time machine to reinvent templates with all the experience we have today, would we ever consider expressing them as string concatenation? I doubt it. Doing that would be unarguably more flexible and allow more things, but I think it would also be clearly less-integrated and outside the language - it would be closer to using compile-time I/O to create a file and then #include-ing it (really, closer to an expanded preprocessor) than writing actual first-class generic code in the language. There were good reasons we didn't implement templates using a preprocessor approach. This feels a subset of the same question, or at least a related question.

Finally, it's not only about syntax - IIUC, in Andrew's excellent paper, both | | and |# #| are done after parsing, not before. I think asking for string support the way it seems to be described below is asking to change that model to be before parsing? I used to favor that, but I've been convinced I was probably wrong about that.



From: SG7 <sg7-bounces_at_[hidden]> On Behalf Of David Rector via SG7
Sent: Tuesday, October 27, 2020 12:38 PM
To: sg7_at_[hidden]
Cc: David Rector <davrec_at_[hidden]>
Subject: Re: [SG7] Metaprogramming

My position on the injection issue has evolved, so I will re-summarize, given the importance I perceive of any imminent decision on the syntax, then bow out to give others space to consider it.

# 1
There is no need to make any semantic changes, e.g. to allow string injection in arbitrary contexts as I earlier suggested/implemented, at least at this point, and perhaps not ever. I.e. I would support leaving the semantics as Andrew et al have implemented them, so that only identifiers may be constructed from strings (via `|# ... #|`, formerly `idexpr(...)`). (Andrew I do not think you will ever be able to get rid of that, in spite of its unhygienic nature -- it is just too essential, as your book example below demonstrates, and its very necessity demonstrates the "power vs. safety" tradeoff of string injection vs. fully-semantically-structured injection. There will always be dirty jobs which require us to set hygiene aside; that we can do so in an encapsulated space must suffice.)

# 2
It is however worth considering at this point whether to tweak the *syntax* to leave room *in the future* to allow such additional string injections -- e.g. to eventually allow:
  - string-injected expressions (e.g. `<< meta_cast<int>("3 + " << generate_42_string())`), or
  - string-injected statements or declarations (`<< strange_decl_string_transformation("int i = 42;")`).
  - others?

To me, the best option to introduce future flexibility, while retaining Andrew's semantics, is to essentially *invert his syntax*: represent all non-dependent injected content as strings, i.e. in quotes, and leave all dependent injected content without additional wrapping (i.e. no |# %{...} #| needed). For reference, a reproduction of Andrew's proposed syntax and my proposed modification (to which Ville has expressed opposition).

```
//ANDREW'S SYNTAX:
template<typename T>
consteval meta::info property(string_view id) {
  string member_name = "m_" + id;
  string getter_name = "get_" + id;
  string setter_name = "set_" + id;
  return <class {
    private:
      T |# %{member_name} #|;
    public:
      T const& |# %{getter_name} #|() const {
        return |# %{member_name} #|;
      }
      void |# %{setter_name} #|(T const& x) {
        |# %{member_name} #| = x;
      }
  }>;
}

struct book {
  << property<string>("author");
  << property<string>("title");
  // other book properties
};


// PROPOSED ALTERNATIVE: i.e. same "semantics" (= generated AST nodes etc)
// after parsing, just different "syntax" (= Parser implem):

template<typename T>
consteval void inject_property(string_view id) {
  string member_name = "m_" + id;
  string getter_name = "get_" + id;
  string setter_name = "set_" + id;

  meta << "private:"
          " T " << member_name << ";"
          "public: "
          " T const& " << getter_name << "() const {"
          " return " << member_name << ";"
          " }"
          " void " << setter_name << "(T const& x) {"
       << member_name << " = x;"
          " }";
}

struct book {
  consteval {
    inject_property<string>("author");
    inject_property<string>("title");
    // other book properties
  }
};
```

Implementation: the compiler would require that nearly all such injected strings, excepting only those representing identifiers, be *non-dependent*, such that their content can be parsed and thus semantically verified right away, i.e. as if they were not string literals at all. An error would be emitted prior to instantiation whenever this is not the case.

This gives us the option to relax these restrictions in the future, i.e. allow dependent expressions in additional contexts beyond identifiers, should metaprogramming expand beyond anything we could have foreseen here, and additional needs become apparent.

Rough examples:
```
meta << "T " << generate_member_name(dependentargs) << ";" //ok: only identifier expression is dependent
meta << generate_T() << generate_member_name(dependentargs) << ";" //maybe ok if generate_T() is non-dependent.
meta << generate_type_and_member_name(dependentargs) << ";"; //ERROR (for now?)
```

# 3
Two virtues might here: Andrew's intention is to make meta content better stand out, so that it cannot be confused, at a glance, with non-meta content. Wise. However, if the syntax is too unfamiliar, it creates excessive burdens on the common user.

By putting injected content in quotes, not only do we make it stand out while remaining familiar, but we also no longer need the unquote operator `%{...}`, the idexpr operator `|# ... #|`, and possibly not the fragment syntax (that one is arguable though -- it probably would still be nice to still represent a fragment as a `meta::info` object which can be reflected etc).

In particular, regarding familiarity: the output stream syntax proposed above is how people metaprogram nowadays, the only difference is they generate new C++ files and metaprogram via the build system. See only the clang source itself: e.g. https://github.com/llvm/llvm-project/blob/master/clang/utils/TableGen/ClangAttrEmitter.cpp%7C01%7Chsutter%40microsoft.com%7C912c351fc05f43dc821808d87aafe988%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637394243246051957%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=or3cTo6SgEyIy7oGm40jMmww6NEMkF3LdV6S12lN0cM%3D&reserved=0> is used to generate an intermediate file (via tablegen) used during subsequent compilation of clang. (Which raises the question: given that this is clearly a necessary technique, is it any less hygienic/more difficult to debug to create C++ files out of strings via some intermediate program run during the build process -- during which time no semantic verification can be done until the next stage of the build -- than to do it at compile time, when the compiler can do at least some semantic checks?).

Thanks for your consideration all, and thank you Andrew et al for your hard work. Thank you Roland for having a look at my old implementation as well.

Dave



SG7 list run by sg7-owner@lists.isocpp.org

Older Archives on Google Groups