ISOCPP sg16 List: Re: [isocpp-sg16] Runtime behaviors should not require knowledge of the literal encoding

From: Steve Downey <sdowney_at_[hidden]>
Date: Fri, 13 Feb 2026 16:24:39 -0500

The partial escape hatch, such as it is, is buried in the requirements for
encoding digits. The digits 0 through 9, in all encodings, are required to
be encoded as '0' + n, for those digits.
For other bases, it's less clear that this can work. Letters are not
actually required to be contiguous, so it's not as simple as 'A'+n, but a
table is still a possible implementation. The thing is, that these
characters are based in the literal encoding, not the hopefully compatible
execution encoding, by way of starting with actual character literals.

This is actually important because we do not want to require that library
implementations actually look up, somehow, the current encoding of the
digit 7 in order to emit it.

Matching is similarly based in the literal encoding.

The distinction between the literal encoding and the execution encoding is
firmly rooted in being able to describe the problem, but definitely not to
solve it. The earliest language specified that literals were encoded
according to the current locale, giving the possible, but wildly incorrect,
reading that encoded literals might be dynamically rendered, somehow.

On Fri, Feb 13, 2026 at 1:19 PM Corentin via SG16 <sg16_at_[hidden]>
wrote:

> Hey folks,
>
> This is a follow up to the discussion we had this week in the context
> of P3876R0 (Extending <charconv> support to more character types)
>
> I objected (and still do) to some wording, namely:
> > The output code points are inserted into the range [first, last) by
> encoding them in the respective literal encoding for character literals of
> the type of *first.
> (I think the paper has two such instance of that wording)
>
> The status quo is that the compiler encodes strings in an encoding
> described in the core wording (the literal encoding),
> During execution, the library assumes another encoding, the execution
> encoding.
>
> By the word of law, these things are completely unrelated today.
> Of course this is wrong, and we should fix it
> https://isocpp.org/files/papers/P3671R0.pdf
>
> But even if we admit a relation, that relation is not a relation of
> equivalence.
> It is still common for example to have variance in the representation of
> characters not in the basic character set literal.
> This is the case for example for ISO 8859 and the various EBCDIC code
> pages.
>
> So if we admit that the execution encoding need not be exactly the literal
> encoding,
> it is strange to talk about the literal encoding in the library at all.
>
> So, we should talk about something else.
> And because the execution encoding is local-dependant, and because we do
> not want from_chars and to_chars to be local dependent, we should talk
> about the
> execution encoding of the "C" locale - (ie the encoding known by the
> library to be associated with the non-locale locale) - there is precedence
> in both C and C++.
>
> Because P1880 went nowhere, we should also specify that the encoding
> associated with char8_t is UTF-8 (ideally we would put all of that
> wording in [library.general] by introducing a term of art,
>
> For example:
>
> The locale-independent text encoding associated with a type T
> - the narrow execution encoding
> associated with the "C" locale if T is of type cv char*, string_view, string
> - the wide execution encoding
> associated with the "C" locale if T is of type cv wchar_t*, wstring_view,
> wstring
> - UTF-8 if T is of type cv
> wchar8_t*, u8string_view, u8string,
> - ...
>
> Then we can use that definition in the wording of to_chars, from_chars.
>
>
> One could argue that it does not matter for these two functions.
> Indeed, these families of functions consume and produce characters that
> are in the basic character set so if you assume a world where P3671 is
> adopted,
> their representation will always be the same in the literal and execution
> encodings.
>
> However, this is only true of these functions, and the paper proposed
> wording relies too much on accidental happenstance and cannot be
> generalized to other
> functions such as C character classification functions.
>
> In a world where we do not admit P3671, it would make from_chars/to_chars
> inconsistent with strto_/ato_. Which seems undesirable.
>
> It might seem a bit academic, however these are indeed implementation
> concerns.
> As proposed by the paper an implementation to have encoding/decoding
> tables for whatever the literal encoding is, which is not the case today.
> And I'd rather wording that can be reused / exhibit a consistent model
> rather than "let's use the literal encoding for to_chars because 0 is in
> the basic character set and infinity is spelled INF rather than ∞ by
> printf so we are fine"
>
> I want to reiterate that the wording in P3876R0 is very novel indeed.
>
> Victor is correct that std::format does something similar to P3876R0 in a
> couple of places, i.e. the escaping of strings and in [time.format].
> We should tweak these.
>
> However, neither width estimation (which just ask the implementation to
> assume that the string is in some encoding that the string has to pick),
> or printing ( which just says "if the literal encoding was utf-8 then
> assume the string is utf-8 and use vprint_unicode, which is consistent
> with P3671 and perfectly fine - and this is a behavior decided at compile
> time, not a runtime behavior).
>
> Other parts of the standard correctly refers to the execution encoding of
> the "C" locale, or refers to strings produced at compile times (reflection,
> contracts).
>
> (Regardless of what we do, it does not affect that from_char will
> remain locale independent, and, IFF we adopt P3671, there will be no
> observable behavior difference)
>
> Cheers.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2026/02/4670.php
>

Received on 2026-02-13 21:24:56