sg7: Re: [SG7] Metaprogramming

From: David Rector <davrec_at_[hidden]>
Date: Tue, 27 Oct 2020 15:38:23 -0400

My position on the injection issue has evolved, so I will re-summarize, given the importance I perceive of any imminent decision on the syntax, then bow out to give others space to consider it.

# 1
There is no need to make any semantic changes, e.g. to allow string injection in arbitrary contexts as I earlier suggested/implemented, at least at this point, and perhaps not ever. I.e. I would support leaving the semantics as Andrew et al have implemented them, so that only identifiers may be constructed from strings (via `|# … #|`, formerly `idexpr(…)`). (Andrew I do not think you will ever be able to get rid of that, in spite of its unhygienic nature — it is just too essential, as your book example below demonstrates, and its very necessity demonstrates the "power vs. safety" tradeoff of string injection vs. fully-semantically-structured injection. There will always be dirty jobs which require us to set hygiene aside; that we can do so in an encapsulated space must suffice.)

# 2
It is however worth considering at this point whether to tweak the *syntax* to leave room *in the future* to allow such additional string injections — e.g. to eventually allow:
  - string-injected expressions (e.g. `<< meta_cast<int>("3 + " << generate_42_string())`), or
  - string-injected statements or declarations (`<< strange_decl_string_transformation("int i = 42;")`).
  - others?

To me, the best option to introduce future flexibility, while retaining Andrew’s semantics, is to essentially *invert his syntax*: represent all non-dependent injected content as strings, i.e. in quotes, and leave all dependent injected content without additional wrapping (i.e. no |# %{…} #| needed). For reference, a reproduction of Andrew’s proposed syntax and my proposed modification (to which Ville has expressed opposition).

```
//ANDREW’S SYNTAX:
template<typename T>
consteval meta::info property(string_view id) {
  string member_name = "m_" + id;
  string getter_name = "get_" + id;
  string setter_name = "set_" + id;
  return <class {
    private:
      T |# %{member_name} #|;
    public:
      T const& |# %{getter_name} #|() const {
        return |# %{member_name} #|;
      }
      void |# %{setter_name} #|(T const& x) {
        |# %{member_name} #| = x;
      }
  }>;
}

struct book {
  << property<string>("author");
  << property<string>("title");
  // other book properties
};

// PROPOSED ALTERNATIVE: i.e. same "semantics" (= generated AST nodes etc)
// after parsing, just different "syntax" (= Parser implem):

template<typename T>
consteval void inject_property(string_view id) {
  string member_name = "m_" + id;
  string getter_name = "get_" + id;
  string setter_name = "set_" + id;

  meta << "private:"
          " T " << member_name << ";"
          "public: "
          " T const& " << getter_name << "() const {"
          " return " << member_name << ";"
          " }"
          " void " << setter_name << "(T const& x) {"
       << member_name << " = x;"
          " }";
}

struct book {
  consteval {
    inject_property<string>("author");
    inject_property<string>("title");
    // other book properties
  }
};
```

Implementation: the compiler would require that nearly all such injected strings, excepting only those representing identifiers, be *non-dependent*, such that their content can be parsed and thus semantically verified right away, i.e. as if they were not string literals at all. An error would be emitted prior to instantiation whenever this is not the case.

This gives us the option to relax these restrictions in the future, i.e. allow dependent expressions in additional contexts beyond identifiers, should metaprogramming expand beyond anything we could have foreseen here, and additional needs become apparent.

Rough examples:
```
meta << "T " << generate_member_name(dependentargs) << ";" //ok: only identifier expression is dependent
meta << generate_T() << generate_member_name(dependentargs) << ";" //maybe ok if generate_T() is non-dependent.
meta << generate_type_and_member_name(dependentargs) << ";"; //ERROR (for now?)
```

# 3
Two virtues might here: Andrew’s intention is to make meta content better stand out, so that it cannot be confused, at a glance, with non-meta content. Wise. However, if the syntax is too unfamiliar, it creates excessive burdens on the common user.

By putting injected content in quotes, not only do we make it stand out while remaining familiar, but we also no longer need the unquote operator `%{…}`, the idexpr operator `|# … #|`, and possibly not the fragment syntax (that one is arguable though — it probably would still be nice to still represent a fragment as a `meta::info` object which can be reflected etc).

In particular, regarding familiarity: the output stream syntax proposed above is how people metaprogram nowadays, the only difference is they generate new C++ files and metaprogram via the build system. See only the clang source itself: e.g. https://github.com/llvm/llvm-project/blob/master/clang/utils/TableGen/ClangAttrEmitter.cpp is used to generate an intermediate file (via tablegen) used during subsequent compilation of clang. (Which raises the question: given that this is clearly a necessary technique, is it any less hygienic/more difficult to debug to create C++ files out of strings via some intermediate program run during the build process — during which time no semantic verification can be done until the next stage of the build — than to do it at compile time, when the compiler can do at least some semantic checks?).

Thanks for your consideration all, and thank you Andrew et al for your hard work. Thank you Roland for having a look at my old implementation as well.

Dave

Received on 2020-10-27 14:38:30