C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Tom Honermann (tom_at_[hidden])
Date: 2019-08-15 07:42:35


On 8/15/19 8:35 AM, Steve Downey wrote:
> I think it's clear now that we don't have an answer to Tom's question
> in the title. And that the standard's language is both vague and
> archaic in this area.
> I think, before we make observable behavior changes, it would be
> worthwhile to respecify [lex] using more modern language, particularly
> distinguishing codepoints and encodings, and avoiding 'character', as
> being misleading.
>
> As observed in [filesystem] unsigned and signed char do not have
> associated encodings. Moving that forward into the front matter might
> be useful. As well as providing a (better) name for presumed
> narrow/wide character execution encoding and a better name for current
> default locale associated narrow/wide character encoding.
>
> I'm willing to take a stab at it.

Excellent, me too.  I put this on the agenda for the next SG16 telecon
(on August 21st), but I strongly suspect we won't get to it as the first
item on the agenda for that meeting is web_view and I think it will take
most of our time.  So let's plan to have a draft (even if just a list of
potential replacement terms and general strategy for parts of the
standard to be updated) for the following telecon (September 4th or
11th).  We can collaborate offline in the mean time.

Tom.

>
> On Thu, Aug 15, 2019, 07:12 Steve Downey <sdowney_at_[hidden]
> <mailto:sdowney_at_[hidden]>> wrote:
>
> Execution encoding is a term we use in conversation, it's not
> actually a term in the standard. The standard speaks of execution
> character sets, the values of which are determined by locale.
> Which locale is not specified.
>
> On Wed, Aug 14, 2019, 23:21 Tom Honermann via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
>
> On 8/14/19 10:57 AM, Peter Dimov wrote:
> > Tom Honermann wrote:
> >> On 8/14/19 3:54 AM, Peter Dimov wrote:
> >>> Tom Honermann wrote:
> >>>
> >>>>   I think we *might* be successful in using "execution
> encoding" to
> >>>> apply to both the compile-time and run-time encodings by
> extending the
> >>>> term with specific qualifiers; e.g., "presumed execution
> encoding" and
> >>>> "run-time/system/native execution encoding".
> >>> This would be implying that there's a single "execution"
> or "native"
> >>> encoding, whereas there are many.
> >>>
> >>> - encoding used for character literals
> >> I made the "presumed execution encoding" distinction
> specifically for this
> >> case.
> > Right, and I am saying that calling all the encodings
> "<adjective> execution
> > encoding" implies that they are if not the same, then
> somehow related, and
> > they aren't.
> Ok, that is a fair critique.
> >
> > I would call the encoding used for narrow character literals
> "narrow literal
> > encoding" and the encoding used for wide character literals
> "wide literal
> > encoding". This is what they are.
>
> I feel some reluctance to changing a term that has been around
> for so
> long, and this strikes me as too specific.  There are other
> constructs
> that are also encoded according to the (presumed) execution
> encoding.
> For example source locations exposed via the __FILE__ macro,
> function
> names exposed via __func__, etc..
>
> We don't know at compile-time how encoded literals will be
> used at
> run-time.  They may be passed to the locale sensitive character
> conversion functions, used as filenames, written to a
> terminal, etc...
> All of these encodings are not known until run-time.  I kind
> of like the
> use of "presumed execution encoding" as indicating a
> compatible subset
> of all of the encodings used at run-time.
>
> >
> > "Execution encoding" made sense when a program was, say,
> written in
> > Krasnoyarsk and intended to be executed in Kuala Lumpur. A
> Krasnoyarsk
> > machine used the Krasnoyarsk encoding for everything, and a
> Kuala Lumpur
> > machine used the Kuala Lumpur encoding for everything. Hence
> source and
> > execution.
>
> It still very much makes sense when cross-compiling today.
>
> Tom.
>
> _______________________________________________
> Core mailing list
> Core_at_[hidden] <mailto:Core_at_[hidden]>
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2019/08/7062.php
>



SG16 list run by sg16-owner@lists.isocpp.org