Subject: Reading the tea leaves: What is the execution encoding in C?
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-02-23 03:42:43
I have a question for C experts as to the intended meaning of "execution
Each source character set member and escape sequence in character constants
and string literals is converted to the corresponding member of the
execution character set; if there is no corresponding member, it is
converted to an implementation-defined member other than the null (wide)
Two sets of characters and their associated collating sequences collating
sequences shall be defined: the set in which source files are written (the
source character set), and the set interpreted in the execution environment
(the execution character set). Each set is further divided into a basic
character set, whose contents are given by this subclause, and a set of
zero or more locale-specific members (which are not members of the basic
character set) called extended characters. The combined set is also called
the extended character set. The values of the members of the execution
character set are implementation-defined.
5.2.2 Alphabetic escape sequences representing nongraphic characters in the
execution character set are intended to produce actions on display devices
The wording of ctype.h functions use the term "character" without
specifying what the associated encoding is presumed to be.
C++ has the same lack of clarity.
As such, C++ will hopefully shift to "literal character set"/"literal
character encoding" to describe the encoding of string & character
The question then is what the intended behavior of, for example
"isalpha('a')" is if the literal and execution encoding differ (say one is
ascii the other ebcdic).
Is the intent that:
- C assumes 'a' is a character in the environment execution encoding -
and presumably its UB if it isn't
- C is perfectly happy saying that isalpha('a') is false
Would say have different questions for characters outside of the basic
say isalpha('Ã©') assuming iso 8859-1 literal encoding (Latin Small Letter E
with Acute, in case the mailing list butchers the text, the irony of which
What about putc('\\') if an encoding is ASCII and the other Shift-JIS ?
In other words, is there the intent that there exist a relation between the
literal and execution encodings (the later of which may be affected by
Is the "execution encoding" the encoding assumed by locale.h/stdlib.h
I don't think explicitly stated either, the wording mentions these
functions accepting "character"s without stating the presumed encoding of
There are the following definitions
sequence of one or more bytes representing a member of the extended
character set of either the source or the execution environment
value representable by an object of type wchar_t, capable of representing
any character in the current locale
But it is unclear whether they apply to the language or library. And of
course, ctypes functions do not accept multibytes characters!
As a user, I would expect a precondition that the environment
execution encoding is a super set of the literal execution, but it is
unclear to me whether that's stated or intended.
I really hope you can shed light on the original intent and history! Thanks
SG16 list run by firstname.lastname@example.org