Date: Wed, 14 Aug 2019 12:53:12 -0700
On Wednesday, 14 August 2019 10:01:19 PDT Joseph Myers wrote:
> The present implementation-defined interpretation of the byte sequence in
> source files allows a default of "UTF-8 in strings, comments can use
> arbitrary bytes" (which thus allows existing source files in a range of
> ASCII-compatible 8-bit character sets if the non-ASCII characters only
> appear in comments, without needing to tell the compiler which character
> set is being used).
That's not correct. MSVC does interpret the bytes in comments and will
complain if it can't decode from the 8-bit ACS to UTF-16.
That also means most legacy 8-bit-encoded files with high-bit comments will
not compile with /utf-8.
> The present implementation-defined interpretation of the byte sequence in
> source files allows a default of "UTF-8 in strings, comments can use
> arbitrary bytes" (which thus allows existing source files in a range of
> ASCII-compatible 8-bit character sets if the non-ASCII characters only
> appear in comments, without needing to tell the compiler which character
> set is being used).
That's not correct. MSVC does interpret the bytes in comments and will
complain if it can't decode from the 8-bit ACS to UTF-16.
That also means most legacy 8-bit-encoded files with high-bit comments will
not compile with /utf-8.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel System Software Products
Received on 2019-08-14 21:53:14