C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Draft: char8_t backward compatibility remediation paper

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 5 Dec 2018 22:33:17 -0500
All of the u8 strings I saw contained no escape sequences.
Not that \u escapes would change the argument. They work identically in
source and explicit encoding.
Right now, u8"" means transcode from source encoding to UTF-8 rather than
to execution encoding.
I suspect that there are often errors where if the source encoding was not
UTF-8, the result string would not be the intended one.




On Wed, Dec 5, 2018, 22:19 Tom Honermann <tom_at_[hidden]> wrote:

> On 12/5/18 8:31 PM, Markus Scherer wrote:
>
> On Wed, Dec 5, 2018 at 3:34 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
>> How many contain text that is not already UTF-8?
>
>
> I am not sure what you are asking. Most of the u8"literals" I am seeing
> contain non-ASCII characters. Many as literal characters, a bunch of
> \uhhhh, and a few \U00hhhhhh.
>
> I was likewise uncertain about this question.
>
> Steve, I'm guessing the question you're trying to get at is, would there
> be any behavioral difference if the u8 prefix was simply dropped? I think
> this is equivalent to asking the question, are the source files for these
> examples encoded as UTF-8 and is the compiler invoked such that the source
> encoding and presumed execution encoding are both UTF-8 (always the case
> for Clang, the default for gcc unless -finput-charset or -fexec-charset is
> used, and not the case for MSVC unless /utf-8 is used).
>
> Tom.
>

Received on 2018-12-06 04:33:31