On Thu, May 28, 2020 at 5:22 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

On 5/28/20 11:08 AM, Corentin via SG16 wrote:

On Thu, May 28, 2020, 16:55 Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

Please also address all the uses of this term in the library section.

There would be no change, although the basic source character appears in library in a few places ( it would be redefined as basic execution encoding)

At least allowing NUL where it was prohibited is probably not intended.

Sure and I think being more explicit in library would be an improvement.

Source character set is redefined as being the Unicode character set

It seems like we're encouraging homoglyph issues. Do we expect open source projects to maintain coding guidelines that restrict characters outside the ASCII range?

This change would't modify the set of characters that can appear in a source file.

Let's not underestimate the impact of making things "first class citizens" of the language where they were not such before.

Do we really expect people to ever type \uxxxx in C++20.

What has changed in C++20 that would negate prior motivation for the feature?

Tom.

They wouldn't be more or less first class citizen as they are today given we would not changing the requirements on characters must be supported by the physical character set

It seems that encoding the stuff outside the basic source character set as UCNs in headers is exactly how one would avoid per-header encoding selection given the practical reality.

The practical reality being that most encodings that are intermixed have the same encoded value for most of the members of the basic source character set.

Thus, we have the concept of a basic source character set (and we also have digraphs and C's iso646.h).

Therefore, although not absolutely true, simply avoiding characters outside the basic source character set (and those requiring digraphs, etc.) is generally good enough for allowing headers to be included for compilation with source specified (via command line, etc.) as being in different encodings.

We could define a "portable subset". interestingly, i don't think this is currently the case?

It's portable within "families" of encodings.

This wouldn't change

As in the current wording does not prevent a physical character set that doesn't contain the letter "a", for example

Sure, for users that don't want to spell `char` or `template`...

Otherwise, the physical character set might be based on one that does not contain the letter "a", but the compiler likely is (in effect) defining one that does have "a".

This change wouldn't modify the portability of headers.

Changing what user can or cannot do will change user behaviour, which can change the portability of headers.

This mitigation for the problem you identified is not guaranteed. Lacking such mitigation, developers would be forced by libraries to switch to Unicode source even if they do not wish to.

(Okay, there is such a thing as compiler extensions for __asm__ names, but their usability is limited).

Agreed. We need to establish whether universal character names are actually used in identifiers in production code

The body of production code in the world is not exactly something WG 21 has full access to.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16