On Mon, 1 Jun 2020 at 21:47, Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 01/06/2020 11.08, Corentin wrote:
> On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:

>     String literals also have an inherent length.  I'm mildly opposed to normatively
>     specifying a required alteration of the "source-code-apparent" length for types
>     whose encoding are not variable-width to start with (u8, u16).  That leaves
>     1 and 2 for me.
> The length of "©" will be different in utf8 or latin1 for example - it should be defined in the number of code units in the execution encoding

Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
Not so for Latin-1: For one character appearing in source code,
I'd expect one length unit.

As I said, it's only a mild preference.

However, I must say I'm missing a bit of a big picture here:
What's the actual problem to be solved?

Parts of the standard were written with the assumption that 1 abstract character = 1 codepoint = 1 code unit = 1 glyph.
This is not the case so I'm trying to identify what would need tweaking :)