C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] [isocpp-core] [SG16-Unicode] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 14 Aug 2019 08:31:57 -0400
On 8/14/19 2:49 AM, Corentin Jabot via Core wrote:
>
>
> On Wed, Aug 14, 2019, 4:46 AM Tony V E <tvaneerd_at_[hidden]
> <mailto:tvaneerd_at_[hidden]>> wrote:
>
>
>
> On Tue, Aug 13, 2019 at 8:57 AM Corentin Jabot
> <corentinjabot_at_[hidden] <mailto:corentinjabot_at_[hidden]>> wrote:
>
>
>
> On Tue, 13 Aug 2019 at 14:52, Ville Voutilainen
> <ville.voutilainen_at_[hidden]
> <mailto:ville.voutilainen_at_[hidden]>> wrote:
>
> On Tue, 13 Aug 2019 at 15:35, Corentin Jabot via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> >
> >
> > Chiming in with my favorite solution:> Forbid u8/u16/u32
> literals in non unicode encoded files
>
> But presumably not the ones that look like u8"\U1234" ?
>
>
> Yes, there is no reason to disallow that as It can't be
> misinterpreted by neither the compiler or people (and quite a
> lot of code would needlessly break)
>
>
> I find your lack of faith in people's ability to misinterpret
> something disturbing.
> :-)
>
>
> 😁 (Challenging your mail client)
>
>
> \Uxxxx is unambiguous.
>
> u8"é" is ambiguous. Both people and the compiler may interpret that in
> a variety of ways. Notably if I have utf-8 in that file, which I wrote
> on Linux, but then the msvc compiler thinks it's windows 1252...
> Mojibake.
There is no ambiguity there, just bog standard mojibake due to incorrect
source file encoding assumptions. "é" has exactly the same set of
"problems" as L"é", u8"é", u"é", and U"é".
>
>
> People also seem to be confused
>
> https://stackoverflow.com/questions/23471935/how-are-u8-literals-supposed-to-work

Yes, that is a typical example of someone learning that source file
encoding and execution encoding can be independently controlled. Note
that the example even illustrates the individual being confused about
handling of u8 literals and *then* becoming confused about handling of
ordinary literals after learning about gcc's -finput-charset option (but
apparently having not yet learned about gcc's -fexec-charset option).

Tom.

>
>
> --
> Be seeing you,
> Tony
>
>
> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2019/08/7049.php



Received on 2019-08-14 07:33:59