On 8/14/19 2:49 AM, Corentin Jabot via Core wrote:


On Wed, Aug 14, 2019, 4:46 AM Tony V E <tvaneerd@gmail.com> wrote:


On Tue, Aug 13, 2019 at 8:57 AM Corentin Jabot <corentinjabot@gmail.com> wrote:


On Tue, 13 Aug 2019 at 14:52, Ville Voutilainen <ville.voutilainen@gmail.com> wrote:
On Tue, 13 Aug 2019 at 15:35, Corentin Jabot via Core
<core@lists.isocpp.org> wrote:
>
>
> Chiming in with my favorite solution:> Forbid u8/u16/u32 literals in non unicode encoded files

But presumably not the ones that look like u8"\U1234" ?

Yes, there is no reason to disallow that as It can't be misinterpreted by neither the compiler or people (and quite a lot of code would needlessly break)


I find your lack of faith in people's ability to misinterpret something disturbing.
:-)

😁 (Challenging your mail client)


\Uxxxx is unambiguous.

u8"é" is ambiguous. Both people and the compiler may interpret that in a variety of ways. Notably if I have utf-8 in that file, which I wrote on Linux, but then the msvc compiler thinks it's windows 1252...
Mojibake.
There is no ambiguity there, just bog standard mojibake due to incorrect source file encoding assumptions.  "é" has exactly the same set of "problems" as L"é", u8"é", u"é", and U"é".


People also seem to be confused

https://stackoverflow.com/questions/23471935/how-are-u8-literals-supposed-to-work

Yes, that is a typical example of someone learning that source file encoding and execution encoding can be independently controlled.  Note that the example even illustrates the individual being confused about handling of u8 literals and *then* becoming confused about handling of ordinary literals after learning about gcc's -finput-charset option (but apparently having not yet learned about gcc's -fexec-charset option).

Tom.



--
Be seeing you,
Tony

_______________________________________________
Core mailing list
Core@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2019/08/7049.php