C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 15 Aug 2019 08:35:12 -0400
I think it's clear now that we don't have an answer to Tom's question in
the title. And that the standard's language is both vague and archaic in
this area.
I think, before we make observable behavior changes, it would be worthwhile
to respecify [lex] using more modern language, particularly distinguishing
codepoints and encodings, and avoiding 'character', as being misleading.

As observed in [filesystem] unsigned and signed char do not have associated
encodings. Moving that forward into the front matter might be useful. As
well as providing a (better) name for presumed narrow/wide character
execution encoding and a better name for current default locale associated
narrow/wide character encoding.

I'm willing to take a stab at it.

On Thu, Aug 15, 2019, 07:12 Steve Downey <sdowney_at_[hidden]> wrote:

> Execution encoding is a term we use in conversation, it's not actually a
> term in the standard. The standard speaks of execution character sets, the
> values of which are determined by locale. Which locale is not specified.
>
> On Wed, Aug 14, 2019, 23:21 Tom Honermann via Core <core_at_[hidden]>
> wrote:
>
>> On 8/14/19 10:57 AM, Peter Dimov wrote:
>> > Tom Honermann wrote:
>> >> On 8/14/19 3:54 AM, Peter Dimov wrote:
>> >>> Tom Honermann wrote:
>> >>>
>> >>>> I think we *might* be successful in using "execution encoding" to
>> >>>> apply to both the compile-time and run-time encodings by extending
>> the
>> >>>> term with specific qualifiers; e.g., "presumed execution encoding"
>> and
>> >>>> "run-time/system/native execution encoding".
>> >>> This would be implying that there's a single "execution" or "native"
>> >>> encoding, whereas there are many.
>> >>>
>> >>> - encoding used for character literals
>> >> I made the "presumed execution encoding" distinction specifically for
>> this
>> >> case.
>> > Right, and I am saying that calling all the encodings "<adjective>
>> execution
>> > encoding" implies that they are if not the same, then somehow related,
>> and
>> > they aren't.
>> Ok, that is a fair critique.
>> >
>> > I would call the encoding used for narrow character literals "narrow
>> literal
>> > encoding" and the encoding used for wide character literals "wide
>> literal
>> > encoding". This is what they are.
>>
>> I feel some reluctance to changing a term that has been around for so
>> long, and this strikes me as too specific. There are other constructs
>> that are also encoded according to the (presumed) execution encoding.
>> For example source locations exposed via the __FILE__ macro, function
>> names exposed via __func__, etc..
>>
>> We don't know at compile-time how encoded literals will be used at
>> run-time. They may be passed to the locale sensitive character
>> conversion functions, used as filenames, written to a terminal, etc...
>> All of these encodings are not known until run-time. I kind of like the
>> use of "presumed execution encoding" as indicating a compatible subset
>> of all of the encodings used at run-time.
>>
>> >
>> > "Execution encoding" made sense when a program was, say, written in
>> > Krasnoyarsk and intended to be executed in Kuala Lumpur. A Krasnoyarsk
>> > machine used the Krasnoyarsk encoding for everything, and a Kuala Lumpur
>> > machine used the Kuala Lumpur encoding for everything. Hence source and
>> > execution.
>>
>> It still very much makes sense when cross-compiling today.
>>
>> Tom.
>>
>> _______________________________________________
>> Core mailing list
>> Core_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>> Link to this post: http://lists.isocpp.org/core/2019/08/7062.php
>>
>

Received on 2019-08-15 14:35:26