sg16: Re: [SG16] Conversion of grapheme clusters to (wide) execution encoding

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Mon, 1 Jun 2020 21:47:46 +0200

On 01/06/2020 11.08, Corentin wrote:
>
>
> On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:

> String literals also have an inherent length. I'm mildly opposed to normatively
> specifying a required alteration of the "source-code-apparent" length for types
> whose encoding are not variable-width to start with (u8, u16). That leaves
> 1 and 2 for me.
>
>
> The length of "©" will be different in utf8 or latin1 for example - it should be defined in the number of code units in the execution encoding

Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
Not so for Latin-1: For one character appearing in source code,
I'd expect one length unit.

As I said, it's only a mild preference.

However, I must say I'm missing a bit of a big picture here:
What's the actual problem to be solved?

Jens

Received on 2020-06-01 14:50:55