Date: Mon, 12 Aug 2019 13:24:01 -0700
On Monday, 12 August 2019 11:01:57 PDT Steve Downey wrote:
> I also believe that "execution character set" is used in opposition to the
> "source character set", and it is applied to the translation of string
> literals because that's when it comes up. On the other hand, this may be
> pre-locale wording that has survived, at least partly because no one wants
> to touch locale.
That may be, but I think the original idea was that the runtime and compiler-
presumed encodings would always be one and the same. This is especially true
when we're talking about non-ASCII compatible encodings. If you had some
EBCDIC-encoded source and had
char str[] = "abc";
compiled to ASCII execution charset, you expect the execution charset to be
ASCII or a superset thereof.
This has broken down in recent decades because Clang and GCC do a pass-through
from the source charset to the narrow execution charset. So you can't get the
same for non-ASCII. The following source if encoded in Latin1:
char str[] = "é";
will not behave properly under UTF-8 execution charset at runtime. I don't
know if -finput-charset=latin1 makes a difference.
MSVC without /utf-8, on the other hand, has the traditional interpretation.
> I also believe that "execution character set" is used in opposition to the
> "source character set", and it is applied to the translation of string
> literals because that's when it comes up. On the other hand, this may be
> pre-locale wording that has survived, at least partly because no one wants
> to touch locale.
That may be, but I think the original idea was that the runtime and compiler-
presumed encodings would always be one and the same. This is especially true
when we're talking about non-ASCII compatible encodings. If you had some
EBCDIC-encoded source and had
char str[] = "abc";
compiled to ASCII execution charset, you expect the execution charset to be
ASCII or a superset thereof.
This has broken down in recent decades because Clang and GCC do a pass-through
from the source charset to the narrow execution charset. So you can't get the
same for non-ASCII. The following source if encoded in Latin1:
char str[] = "é";
will not behave properly under UTF-8 execution charset at runtime. I don't
know if -finput-charset=latin1 makes a difference.
MSVC without /utf-8, on the other hand, has the traditional interpretation.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel System Software Products
Received on 2019-08-12 22:24:12