sg16: Re: [SG16] Conversion of grapheme clusters to (wide) execution encoding

From: Corentin <corentin.jabot_at_[hidden]>
Date: Mon, 1 Jun 2020 22:33:47 +0200

On Mon, 1 Jun 2020 at 21:47, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 01/06/2020 11.08, Corentin wrote:
> >
> >
> > On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>> wrote:
>
> > String literals also have an inherent length. I'm mildly opposed to
> normatively
> > specifying a required alteration of the "source-code-apparent"
> length for types
> > whose encoding are not variable-width to start with (u8, u16). That
> leaves
> > 1 and 2 for me.
> >
> >
> > The length of "©" will be different in utf8 or latin1 for example - it
> should be defined in the number of code units in the execution encoding
>
> Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
> Not so for Latin-1: For one character appearing in source code,
> I'd expect one length unit.
>
> As I said, it's only a mild preference.
>
> However, I must say I'm missing a bit of a big picture here:
> What's the actual problem to be solved?
>

Parts of the standard were written with the assumption that 1 abstract
character = 1 codepoint = 1 code unit = 1 glyph.
This is not the case so I'm trying to identify what would need tweaking :)

>
> Jens
>
>

Received on 2020-06-01 15:37:04