C++ Logo

SG16

Advanced search

Subject: Re: Conversion of grapheme clusters to (wide) execution encoding
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-06-01 14:47:46


On 01/06/2020 11.08, Corentin wrote:
>
>
> On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:

> String literals also have an inherent length.  I'm mildly opposed to normatively
> specifying a required alteration of the "source-code-apparent" length for types
> whose encoding are not variable-width to start with (u8, u16).  That leaves
> 1 and 2 for me.
>
>
> The length of "©" will be different in utf8 or latin1 for example - it should be defined in the number of code units in the execution encoding

Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
Not so for Latin-1: For one character appearing in source code,
I'd expect one length unit.

As I said, it's only a mild preference.

However, I must say I'm missing a bit of a big picture here:
What's the actual problem to be solved?

Jens


SG16 list run by sg16-owner@lists.isocpp.org