Date: Tue, 30 Apr 2024 00:32:13 +0200
On Mon, Apr 29, 2024 at 11:56 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> On 4/29/24 12:01 PM, Daveed Vandevoorde wrote:
>
>
>
> On Apr 28, 2024, at 11:49 AM, Tom Honermann <tom_at_[hidden]>
> <tom_at_[hidden]> wrote:
>
> SG16 reviewed P2996R2 (Reflection for C++26) <http://wg21.link/p2996r2>
> during its 2024-04-24 meeting and will continue review during the
> 2024-05-08 SG16 meeting. I am working on the meeting summary for the
> previous meeting now and hope to publish it in the next few days. In the
> meantime, I wanted to get some discussion going to help prepare for our
> next review (in ~10 days).
>
> Daveed's presentation slides are available here
> <https://docs.google.com/presentation/d/1XYGCTXfnxWyWio8UmLdskl4D4Z7HvKkHHIVmBgT19f8/edit?usp=sharing>
> for anyone that would like to review them. They appear to have had some
> minor updates since the SG16 review (e.g., slide 10 is new).
>
> Slide 6
> <https://docs.google.com/presentation/d/1XYGCTXfnxWyWio8UmLdskl4D4Z7HvKkHHIVmBgT19f8/edit#slide=id.g2cf22bb8e94_0_16>
> lists some requirements for the design. These include:
>
> - Round-tripping must work (e.g., names returned by name_of() must be
> valid input to data_member_spec() via data_member_options_t::name).
> - Output to std::cout must work reasonably (e.g., std::cout <<
> name_of(^int)).
> - Some text may not be source-like text (std::meta::display_name_of).
>
> I would like to further clarify these requirements. P2996R2 authors,
> please answer the following questions.
>
> Are names returned by qualified_name_of() required to be round-trippable?
>
>
> No.
>
> Is support for std::cout specifically required or would support for
> std::format() and std::print() suffice? If std::cout support is
> specifically required, what motivates that requirement?
>
>
> Although *any* reasonably reliable form of output would suffice (including
> printf), having `std::cout << name_of(refl)` just work is quite desirable,
> if only for didactic purposes.
>
> Thanks. I agree having that just work is desirable and I think we can get
> there if name_of() returns a dedicated type as opposed to std::string_view
> .
>
>
> Are all of name_of(), qualified_name_of(), and display_name_of() required
> to return text (perhaps not source-like text, but content that is
> nevertheless text)? In other words, can they be guaranteed to provide
> well-formed text in some encoding?
>
>
> Yes, I think so.
>
> Excellent.
>
>
> During the meeting, we briefly discussed use of an opaque type for the
> return type of name_of() and friends. I would like to see further
> exploration of this idea prior to our next review. I'm envisioning a type
> something like the following (This particular formulation follows existing
> precedent established by the std::filesystem::path native format
> observers, [fs.path.native.obs]
> <http://eel.is/c++draft/fs.path.native.obs>).
>
> class name {
> std::string_view *internal-representation*; // exposition only.
> name(/* unspecified */);
> public:
> constexpr std::string string() const; // ordinary literal encoding.
> constexpr std::wstring wstring() const; // wide literal encoding.
> constexpr std::u8string u8string() const; // UTF-8.
> constexpr std::u16string u16string() const; // UTF-16.
> constexpr std::u32string u32string() const; // UTF-32.
> };
>
> The intent is that the data accessed by the *internal-representation*
> member has static storage duration; perhaps a string literal. The observers
> would then provide access to the name in the above encodings. If I'm not
> mistaken, this should enable use of these names in std::basic_string
> objects during constant evaluation (so long as the object's lifetime is
> appropriately constrained) and run-time while only requiring static
> persistence of the internal representation.
>
>
> I think that option has some interest, but not with the
> internal-representation having static storage duration and not with all
> encodings.
>
> Can you elaborate? Why not give the internal-representation static storage
> duration and convert on demand? Is it because persisting a particular
> encoded result produced during constant evaluation to run-time is awkward
> and could potentially increase static storage (I would expect the linker to
> discard representations that are unreferenced at run-time)? I don't have
> much experience with reflection use cases, so this is a good opportunity
> for me to learn something.
>
> Instead, the internal representation could be “reflection-like”, and the
> member “conversion” functions could produce string/u8string for ephemeral
> results, string_view/u8string_view for persistent ones (i.e., the latter
> would create a static storage duration literal-like representation from the
> compiler-internal representation on demand), and perhaps even a char const*
> for easy interaction with C interfaces.
>
> This sounds like what I was trying to achieve with the above.
>
> What is the motivation for persisting a particular encoded result? If a
> converted result is used to generate a name that is consumed by
> data_member_spec() or similar, I would expect the compiler magic that
> powers those consumers to perform any necessary persistence; likely in the
> internal-representation format. Likewise, if a particular encoded result is
> needed at run-time, it can be produced at run-time from the persisted
> internal-representation (with commensurate overhead of course).
>
> How would you define the class?
>
>
> Support for all five of the standard specified encodings is not
> necessarily required. SG16 can provide a recommendation for LEWG. The paper
> should discuss the pros, cons, and implementation costs for support of each
> encoding. Given existing precedent, lack of support for any given encoding
> should be motivated.
>
> Note that the above type suffices to provide support for printing names
> via std::cout, std::format(), and std::print(). A iostream insertion
> operator and/or a std::formatter specialization could be defined to
> enable printing names without having to call one of the member functions.
>
> If I understand the intent correctly (it would be helpful to clarify this
> in a revision of the paper), names returned by name_of() do not reflect a
> scope but do not necessarily reflect an identifier either. For example,
> something like "operator bool" might be returned. Tangentially, I think
> the formatting of names accepted by data_member_spec() needs to be
> rigorously specified for programs to be portable.
>
>
> Right. For now, since we only propose the synthesis of data members in
> P2996, something like `operator+` or `operator bool` is not possible on the
> input side. But eventually, I’m sure those things will be proposes.
>
> We will need to make a decision regarding how characters that lack
> representation in the ordinary and wide literal encodings are to be
> handled. We have a few options.
>
> 1. Don't provide the above string() and wstring() member functions.
> 2. Constrain the above string() and wstring() member functions so that
> they are not callable (e.g., does not participate in overload
> resolution...) if the associated literal encoding is unable to represent
> all characters that might appear in an identifier.
> 3. Specify that the above string() and wstring() member functions fail
> constant evaluation or throw an exception (or similar error handling) if
> the name uses a character that is not representable in the associated
> literal encoding.
> 4. Specify a way to encode non-representable characters in the names
> returned by the above string() and wstring() member functions and
> specify that data_member_spec() accepts such encoded names.
>
>
> I much prefer option 4 here, and I would like the encoding to look like
> UCNs. Second is probably option 3. I think option 2 is impractical
> altogether.
>
> In retrospect, I agree that option 2 seems weird at best. I don't favor
> option 1 either.
>
> I offered an additional option in response to one of Peter's messages.
>
> 5. Require that consumers of names like data_member_spec() accept
> UNC-like encoded characters (UNC-like because these wouldn't actually be
> UNCs) and require that producers of names like name_of() only generate a
> UNC-like encoded character when the associated (ordinary/wide literal)
> encoding lacks representation for the character.
>
> This would enable both strictly portable names (e.g., a name provided for
> consumption could contain only basic literal character set characters with
> UNC-like escapes for other characters regardless of whether the associated
> literal encoding has support for additional characters; much like UNCs) and
> avoid a requirement to escape characters outside the basic literal
> character set.
>
4, and 5 are inventing a new "portable" encoding.
One that is portable to a couple functions of C++ but not for example a
python binding generator. Or any other library in C++ or in another
language, nor any tool.
While it does solve the problem of technically not producing mojibake, it
also fails to solve the problem in a meaningful, user-friendly way.
Overwhelmingly identifiers will be basic characters. When they aren't,
whoever gets an error in python or javascript that \ is not a valid
identifier
will have a surprise.
Moreover, I'm extremely opposed to producing runtime strings that mirror
lexing elements; it is just too confusing to teach to people.
If data_member_spec() is modified to accept names specified by a class like
> name above, then it will be necessary to specify a way to construct an
> object of that type with a name constructed during constant evaluation.
> Assuming the *internal-representation* format is unspecified, this will
> require means to construct that format in a buffer with a lifetime that
> matches the lifetime of the constructed object.
>
> Yes, sort of. That “buffer” need not be something visible to the
> language. We have a reflection interface called “reflect_value” that can
> also be used for that purpose.
>
> I'm envisioning a change to data_member_options_t to store an (optional)
> object of type name instead of string_view. That would require a way to
> construct an object of type name given an appropriately encoded name. The
> version of name defined above is a reference-like type. Are you
> suggesting that there would be consteval constructors that utilize
> compiler magic like that needed for reflect_value() provided for this
> purpose? If so, that makes sense to me.
>
> Tom.
>
>
> Daveed
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
sg16_at_[hidden]> wrote:
> On 4/29/24 12:01 PM, Daveed Vandevoorde wrote:
>
>
>
> On Apr 28, 2024, at 11:49 AM, Tom Honermann <tom_at_[hidden]>
> <tom_at_[hidden]> wrote:
>
> SG16 reviewed P2996R2 (Reflection for C++26) <http://wg21.link/p2996r2>
> during its 2024-04-24 meeting and will continue review during the
> 2024-05-08 SG16 meeting. I am working on the meeting summary for the
> previous meeting now and hope to publish it in the next few days. In the
> meantime, I wanted to get some discussion going to help prepare for our
> next review (in ~10 days).
>
> Daveed's presentation slides are available here
> <https://docs.google.com/presentation/d/1XYGCTXfnxWyWio8UmLdskl4D4Z7HvKkHHIVmBgT19f8/edit?usp=sharing>
> for anyone that would like to review them. They appear to have had some
> minor updates since the SG16 review (e.g., slide 10 is new).
>
> Slide 6
> <https://docs.google.com/presentation/d/1XYGCTXfnxWyWio8UmLdskl4D4Z7HvKkHHIVmBgT19f8/edit#slide=id.g2cf22bb8e94_0_16>
> lists some requirements for the design. These include:
>
> - Round-tripping must work (e.g., names returned by name_of() must be
> valid input to data_member_spec() via data_member_options_t::name).
> - Output to std::cout must work reasonably (e.g., std::cout <<
> name_of(^int)).
> - Some text may not be source-like text (std::meta::display_name_of).
>
> I would like to further clarify these requirements. P2996R2 authors,
> please answer the following questions.
>
> Are names returned by qualified_name_of() required to be round-trippable?
>
>
> No.
>
> Is support for std::cout specifically required or would support for
> std::format() and std::print() suffice? If std::cout support is
> specifically required, what motivates that requirement?
>
>
> Although *any* reasonably reliable form of output would suffice (including
> printf), having `std::cout << name_of(refl)` just work is quite desirable,
> if only for didactic purposes.
>
> Thanks. I agree having that just work is desirable and I think we can get
> there if name_of() returns a dedicated type as opposed to std::string_view
> .
>
>
> Are all of name_of(), qualified_name_of(), and display_name_of() required
> to return text (perhaps not source-like text, but content that is
> nevertheless text)? In other words, can they be guaranteed to provide
> well-formed text in some encoding?
>
>
> Yes, I think so.
>
> Excellent.
>
>
> During the meeting, we briefly discussed use of an opaque type for the
> return type of name_of() and friends. I would like to see further
> exploration of this idea prior to our next review. I'm envisioning a type
> something like the following (This particular formulation follows existing
> precedent established by the std::filesystem::path native format
> observers, [fs.path.native.obs]
> <http://eel.is/c++draft/fs.path.native.obs>).
>
> class name {
> std::string_view *internal-representation*; // exposition only.
> name(/* unspecified */);
> public:
> constexpr std::string string() const; // ordinary literal encoding.
> constexpr std::wstring wstring() const; // wide literal encoding.
> constexpr std::u8string u8string() const; // UTF-8.
> constexpr std::u16string u16string() const; // UTF-16.
> constexpr std::u32string u32string() const; // UTF-32.
> };
>
> The intent is that the data accessed by the *internal-representation*
> member has static storage duration; perhaps a string literal. The observers
> would then provide access to the name in the above encodings. If I'm not
> mistaken, this should enable use of these names in std::basic_string
> objects during constant evaluation (so long as the object's lifetime is
> appropriately constrained) and run-time while only requiring static
> persistence of the internal representation.
>
>
> I think that option has some interest, but not with the
> internal-representation having static storage duration and not with all
> encodings.
>
> Can you elaborate? Why not give the internal-representation static storage
> duration and convert on demand? Is it because persisting a particular
> encoded result produced during constant evaluation to run-time is awkward
> and could potentially increase static storage (I would expect the linker to
> discard representations that are unreferenced at run-time)? I don't have
> much experience with reflection use cases, so this is a good opportunity
> for me to learn something.
>
> Instead, the internal representation could be “reflection-like”, and the
> member “conversion” functions could produce string/u8string for ephemeral
> results, string_view/u8string_view for persistent ones (i.e., the latter
> would create a static storage duration literal-like representation from the
> compiler-internal representation on demand), and perhaps even a char const*
> for easy interaction with C interfaces.
>
> This sounds like what I was trying to achieve with the above.
>
> What is the motivation for persisting a particular encoded result? If a
> converted result is used to generate a name that is consumed by
> data_member_spec() or similar, I would expect the compiler magic that
> powers those consumers to perform any necessary persistence; likely in the
> internal-representation format. Likewise, if a particular encoded result is
> needed at run-time, it can be produced at run-time from the persisted
> internal-representation (with commensurate overhead of course).
>
> How would you define the class?
>
>
> Support for all five of the standard specified encodings is not
> necessarily required. SG16 can provide a recommendation for LEWG. The paper
> should discuss the pros, cons, and implementation costs for support of each
> encoding. Given existing precedent, lack of support for any given encoding
> should be motivated.
>
> Note that the above type suffices to provide support for printing names
> via std::cout, std::format(), and std::print(). A iostream insertion
> operator and/or a std::formatter specialization could be defined to
> enable printing names without having to call one of the member functions.
>
> If I understand the intent correctly (it would be helpful to clarify this
> in a revision of the paper), names returned by name_of() do not reflect a
> scope but do not necessarily reflect an identifier either. For example,
> something like "operator bool" might be returned. Tangentially, I think
> the formatting of names accepted by data_member_spec() needs to be
> rigorously specified for programs to be portable.
>
>
> Right. For now, since we only propose the synthesis of data members in
> P2996, something like `operator+` or `operator bool` is not possible on the
> input side. But eventually, I’m sure those things will be proposes.
>
> We will need to make a decision regarding how characters that lack
> representation in the ordinary and wide literal encodings are to be
> handled. We have a few options.
>
> 1. Don't provide the above string() and wstring() member functions.
> 2. Constrain the above string() and wstring() member functions so that
> they are not callable (e.g., does not participate in overload
> resolution...) if the associated literal encoding is unable to represent
> all characters that might appear in an identifier.
> 3. Specify that the above string() and wstring() member functions fail
> constant evaluation or throw an exception (or similar error handling) if
> the name uses a character that is not representable in the associated
> literal encoding.
> 4. Specify a way to encode non-representable characters in the names
> returned by the above string() and wstring() member functions and
> specify that data_member_spec() accepts such encoded names.
>
>
> I much prefer option 4 here, and I would like the encoding to look like
> UCNs. Second is probably option 3. I think option 2 is impractical
> altogether.
>
> In retrospect, I agree that option 2 seems weird at best. I don't favor
> option 1 either.
>
> I offered an additional option in response to one of Peter's messages.
>
> 5. Require that consumers of names like data_member_spec() accept
> UNC-like encoded characters (UNC-like because these wouldn't actually be
> UNCs) and require that producers of names like name_of() only generate a
> UNC-like encoded character when the associated (ordinary/wide literal)
> encoding lacks representation for the character.
>
> This would enable both strictly portable names (e.g., a name provided for
> consumption could contain only basic literal character set characters with
> UNC-like escapes for other characters regardless of whether the associated
> literal encoding has support for additional characters; much like UNCs) and
> avoid a requirement to escape characters outside the basic literal
> character set.
>
4, and 5 are inventing a new "portable" encoding.
One that is portable to a couple functions of C++ but not for example a
python binding generator. Or any other library in C++ or in another
language, nor any tool.
While it does solve the problem of technically not producing mojibake, it
also fails to solve the problem in a meaningful, user-friendly way.
Overwhelmingly identifiers will be basic characters. When they aren't,
whoever gets an error in python or javascript that \ is not a valid
identifier
will have a surprise.
Moreover, I'm extremely opposed to producing runtime strings that mirror
lexing elements; it is just too confusing to teach to people.
If data_member_spec() is modified to accept names specified by a class like
> name above, then it will be necessary to specify a way to construct an
> object of that type with a name constructed during constant evaluation.
> Assuming the *internal-representation* format is unspecified, this will
> require means to construct that format in a buffer with a lifetime that
> matches the lifetime of the constructed object.
>
> Yes, sort of. That “buffer” need not be something visible to the
> language. We have a reflection interface called “reflect_value” that can
> also be used for that purpose.
>
> I'm envisioning a change to data_member_options_t to store an (optional)
> object of type name instead of string_view. That would require a way to
> construct an object of type name given an appropriately encoded name. The
> version of name defined above is a reference-like type. Are you
> suggesting that there would be consteval constructors that utilize
> compiler magic like that needed for reflect_value() provided for this
> purpose? If so, that makes sense to me.
>
> Tom.
>
>
> Daveed
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2024-04-29 22:32:38