Date: Thu, 28 May 2020 17:35:03 -0400
I've used universal character names to deal with porting issues with code
written on Windows in European latin-1 encodings. Fortunately local
convention already required ‘-*-coding: latin-1; -*-’ in the file.
It wasn't committed back to master, just transformed when moved to Linux,
so the proper translation could still be done later.
On Thu, May 28, 2020 at 5:22 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> On 5/28/20 11:08 AM, Corentin via SG16 wrote:
>
>
>
> On Thu, May 28, 2020, 16:55 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> wrote:
>
>> Please also address all the uses of this term in the library section.
>>>>
>>>
>>> There would be no change, although the basic source character appears in
>>> library in a few places ( it would be redefined as basic execution encoding)
>>>
>> At least allowing NUL where it was prohibited is probably not intended.
>>
>
> Sure and I think being more explicit in library would be an improvement.
>
>>
>>
>>>
>>>
>>>>
>>>>> - Source character set is redefined as being the Unicode character
>>>>> set
>>>>>
>>>>> It seems like we're encouraging homoglyph issues. Do we expect open
>>>> source projects to maintain coding guidelines that restrict characters
>>>> outside the ASCII range?
>>>>
>>>
>>> This change would't modify the set of characters that can appear in a
>>> source file.
>>>
>> Let's not underestimate the impact of making things "first class
>> citizens" of the language where they were not such before.
>>
>
> Do we really expect people to ever type \uxxxx in C++20.
>
> What has changed in C++20 that would negate prior motivation for the
> feature?
>
> Tom.
>
> They wouldn't be more or less first class citizen as they are today given
> we would not changing the requirements on characters must be supported by
> the physical character set
>
>>
>>
>>>
>>>
>>>> It seems that encoding the stuff outside the basic source character set
>>>> as UCNs in headers is exactly how one would avoid per-header encoding
>>>> selection given the practical reality.
>>>>
>>> The practical reality being that most encodings that are intermixed have
>>>> the same encoded value for most of the members of the basic source
>>>> character set.
>>>> Thus, we have the concept of a basic source character set (and we also
>>>> have digraphs and C's iso646.h).
>>>> Therefore, although not absolutely true, simply avoiding characters
>>>> outside the basic source character set (and those requiring digraphs, etc.)
>>>> is generally good enough for allowing headers to be included for
>>>> compilation with source specified (via command line, etc.) as being in
>>>> different encodings.
>>>>
>>>
>>> We could define a "portable subset". interestingly, i don't think this
>>> is currently the case?
>>>
>> It's portable within "families" of encodings.
>>
>
> This wouldn't change
>
>>
>>
>>> As in the current wording does not prevent a physical character set
>>> that doesn't contain the letter "a", for example
>>>
>> Sure, for users that don't want to spell `char` or `template`...
>> Otherwise, the physical character set might be based on one that does not
>> contain the letter "a", but the compiler likely is (in effect) defining one
>> that does have "a".
>>
>>
>>> This change wouldn't modify the portability of headers.
>>>
>> Changing what user can or cannot do will change user behaviour, which can
>> change the portability of headers.
>>
>>
>>>
>>>
>>>> This mitigation for the problem you identified is not guaranteed.
>>>> Lacking such mitigation, developers would be forced by libraries to switch
>>>> to Unicode source even if they do not wish to.
>>>> (Okay, there is such a thing as compiler extensions for __asm__ names,
>>>> but their usability is limited).
>>>>
>>>
>>> Agreed. We need to establish whether universal character names are
>>> actually used in identifiers in production code
>>>
>> The body of production code in the world is not exactly something WG 21
>> has full access to.
>>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
written on Windows in European latin-1 encodings. Fortunately local
convention already required ‘-*-coding: latin-1; -*-’ in the file.
It wasn't committed back to master, just transformed when moved to Linux,
so the proper translation could still be done later.
On Thu, May 28, 2020 at 5:22 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> On 5/28/20 11:08 AM, Corentin via SG16 wrote:
>
>
>
> On Thu, May 28, 2020, 16:55 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> wrote:
>
>> Please also address all the uses of this term in the library section.
>>>>
>>>
>>> There would be no change, although the basic source character appears in
>>> library in a few places ( it would be redefined as basic execution encoding)
>>>
>> At least allowing NUL where it was prohibited is probably not intended.
>>
>
> Sure and I think being more explicit in library would be an improvement.
>
>>
>>
>>>
>>>
>>>>
>>>>> - Source character set is redefined as being the Unicode character
>>>>> set
>>>>>
>>>>> It seems like we're encouraging homoglyph issues. Do we expect open
>>>> source projects to maintain coding guidelines that restrict characters
>>>> outside the ASCII range?
>>>>
>>>
>>> This change would't modify the set of characters that can appear in a
>>> source file.
>>>
>> Let's not underestimate the impact of making things "first class
>> citizens" of the language where they were not such before.
>>
>
> Do we really expect people to ever type \uxxxx in C++20.
>
> What has changed in C++20 that would negate prior motivation for the
> feature?
>
> Tom.
>
> They wouldn't be more or less first class citizen as they are today given
> we would not changing the requirements on characters must be supported by
> the physical character set
>
>>
>>
>>>
>>>
>>>> It seems that encoding the stuff outside the basic source character set
>>>> as UCNs in headers is exactly how one would avoid per-header encoding
>>>> selection given the practical reality.
>>>>
>>> The practical reality being that most encodings that are intermixed have
>>>> the same encoded value for most of the members of the basic source
>>>> character set.
>>>> Thus, we have the concept of a basic source character set (and we also
>>>> have digraphs and C's iso646.h).
>>>> Therefore, although not absolutely true, simply avoiding characters
>>>> outside the basic source character set (and those requiring digraphs, etc.)
>>>> is generally good enough for allowing headers to be included for
>>>> compilation with source specified (via command line, etc.) as being in
>>>> different encodings.
>>>>
>>>
>>> We could define a "portable subset". interestingly, i don't think this
>>> is currently the case?
>>>
>> It's portable within "families" of encodings.
>>
>
> This wouldn't change
>
>>
>>
>>> As in the current wording does not prevent a physical character set
>>> that doesn't contain the letter "a", for example
>>>
>> Sure, for users that don't want to spell `char` or `template`...
>> Otherwise, the physical character set might be based on one that does not
>> contain the letter "a", but the compiler likely is (in effect) defining one
>> that does have "a".
>>
>>
>>> This change wouldn't modify the portability of headers.
>>>
>> Changing what user can or cannot do will change user behaviour, which can
>> change the portability of headers.
>>
>>
>>>
>>>
>>>> This mitigation for the problem you identified is not guaranteed.
>>>> Lacking such mitigation, developers would be forced by libraries to switch
>>>> to Unicode source even if they do not wish to.
>>>> (Okay, there is such a thing as compiler extensions for __asm__ names,
>>>> but their usability is limited).
>>>>
>>>
>>> Agreed. We need to establish whether universal character names are
>>> actually used in identifiers in production code
>>>
>> The body of production code in the world is not exactly something WG 21
>> has full access to.
>>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2020-05-28 16:38:39