I have a question for C experts as to the intended meaning of "execution encoding".
Each source character set member and escape sequence in character constants and string
literals is converted to the corresponding member of the execution character set; if there is no
corresponding member, it is converted to an implementation-defined member other than the
null (wide) character
Two sets of characters and their associated collating sequences collating sequences shall be defined:
the set in which source files are written (the source character set), and the set interpreted in the
execution environment (the execution character set). Each set is further divided into a basic character
set, whose contents are given by this subclause, and a set of zero or more locale-specific members
(which are not members of the basic character set) called extended characters. The combined set is
also called the extended character set. The values of the members of the execution character set are
5.2.2 Alphabetic escape sequences representing nongraphic characters in the execution character set are
intended to produce actions on display devices as follows
The wording of ctype.h functions use the term "character" without specifying what the associated encoding is presumed to be.
C++ has the same lack of clarity.
As such, C++ will hopefully shift to "literal character set"/"literal character encoding" to describe the encoding of string & character literals.
The question then is what the intended behavior of, for example
"isalpha('a')" is if the literal and execution encoding differ (say one is ascii the other ebcdic).
Is the intent that:
- C assumes 'a' is a character in the environment execution encoding - and presumably its UB if it isn't
- C is perfectly happy saying that isalpha('a') is false
Would say have different questions for characters outside of the basic character sets
say isalpha('é') assuming iso 8859-1 literal encoding (Latin Small Letter E with Acute, in case the mailing list butchers the text, the irony of which is delightful).
What about putc('\\') if an encoding is ASCII and the other Shift-JIS ?
In other words, is there the intent that there exist a relation between the literal and execution encodings (the later of which may be affected by local).
Is the "execution encoding" the encoding assumed by locale.h/stdlib.h functions?
I don't think explicitly stated either, the wording mentions these functions accepting "character"s without stating the presumed encoding of these characters.
There are the following definitions
sequence of one or more bytes representing a member of the extended character set of either the
source or the execution environment
value representable by an object of type wchar_t, capable of representing any character in the
But it is unclear whether they apply to the language or library. And of course, ctypes functions do not accept multibytes characters!
As a user, I would expect a precondition that the environment execution encoding is a super set of the literal execution, but it is unclear to me whether that's stated or intended.
I really hope you can shed light on the original intent and history! Thanks