I (and SG16 in general) have been using the term "execution
character set" and "execution encoding" to refer to both the
encoding known at compile-time that is used to encode character
and string literals and the locale dependent encoding specified by
the LC_CTYPE locale category that is used at run-time by the
character classification and conversion functions. When necessary
to avoid confusion, I've been referring to the former as the
"presumed execution encoding" and the latter as simply the
"run-time execution encoding".
 with user 'alfps' on an r/cpp Reddit thread alerted me to the
possibility that I/we have been using this term incorrectly. I
spent some time looking at both the C and C++ standards and there
does appear to be evidence that "execution character set"
(encoding) refers solely to the encoding known at compile-time
that is used to encode literals. But there doesn't seem to be a
clear term defined for the locale dependent run-time encoding that
governs the behavior of the character classification and
conversion functions. There is some evidence for this encoding
being referred to using the term "native".
From the C++ standard:
(though the definition provided here appears to be specific to
"The native encoding of an
ordinary character string is the operating system dependent
current encoding for path names. The native
encoding for wide character strings is the
implementation-defined execution wide-character set encoding."
(This paragraph, the next one, and p8 (not listed here)
constitute the only uses of "native (ordinary|wide) encoding" in
the C++ standard).
"char: The encoding is the native
ordinary encoding. ..."
"wchar_t: The encoding is the native
wide encoding. ..."
"The specializations required in Table 101 ([locale.category])
convert the implementation-defined native
character set. ... codecvt<wchar_t, char,
mbstate_t> converts between the native character sets for ordinary and
wide characters. ..."
"The specializations required in Table 101 ([locale.category]),
namely ctype<char> and ctype<wchar_t>,
implement character classing appropriate to the implementation's
native character set."
As far as I can tell, none of the highlighted terms above appear
in the C17 standard, but "native environment" appears in a related
- 188.8.131.52p3 "The setlocale function":
"A value of "C" for locale specifies the minimal environment for
C translation; a value of "" for locale specifies the
locale-specific native environment.
Other implementation-defined strings may be passed as the second
argument to setlocale."
C17 suggests that "extended character set" may also be the right
- 7.22p3 "General utilities <stdlib.h>":
"... that is the maximum number of bytes in a multibyte
character for the extended character set
specified by the current locale (category LC_CTYPE),
which is never greater than MB_LEN_MAX."
However, the C++ standard states (non-normatively) that the
"extended character set" extends the basic source character set
and (normatively) that it applies to both the source and execution
"[ Note: The extended character set
is a superset of the basic character set ([lex.charset]). — end
"... An implementation may use any internal encoding, so long as
an actual extended character
encountered in the source file, and the same extended character expressed in the
source file as a universal-character-name (e.g., using the \uXXXX
notation), are handled equivalently except where this
replacement is reverted ([lex.pptoken]) in a raw string
"... The values of type wchar_t can represent
distinct codes for all members of the largest extended character set specified among
the supported locales ([locale])."
So, what term should we be using here? Perhaps a core issue
should be opened for this? A brief search didn't reveal an
(note: you may need to click "continue this thread" when reading
the Reddit thread to see all relevant comments).