C++ Logo

SG16

Advanced search

Subject: Re: Conversion of grapheme clusters to (wide) execution encoding
From: Corentin (corentin.jabot_at_[hidden])
Date: 2020-06-01 15:33:47


On Mon, 1 Jun 2020 at 21:47, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 01/06/2020 11.08, Corentin wrote:
> >
> >
> > On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>> wrote:
>
> > String literals also have an inherent length. I'm mildly opposed to
> normatively
> > specifying a required alteration of the "source-code-apparent"
> length for types
> > whose encoding are not variable-width to start with (u8, u16). That
> leaves
> > 1 and 2 for me.
> >
> >
> > The length of "©" will be different in utf8 or latin1 for example - it
> should be defined in the number of code units in the execution encoding
>
> Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
> Not so for Latin-1: For one character appearing in source code,
> I'd expect one length unit.
>
> As I said, it's only a mild preference.
>
> However, I must say I'm missing a bit of a big picture here:
> What's the actual problem to be solved?
>

Parts of the standard were written with the assumption that 1 abstract
character = 1 codepoint = 1 code unit = 1 glyph.
This is not the case so I'm trying to identify what would need tweaking :)

>
> Jens
>
>



SG16 list run by sg16-owner@lists.isocpp.org