On 01/06/2020 11.08, Corentin wrote:
> On Mon, 1 Jun 2020 at 08:10, Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:
> String literals also have an inherent length. I'm mildly opposed to normatively
> specifying a required alteration of the "source-code-apparent" length for types
> whose encoding are not variable-width to start with (u8, u16). That leaves
> 1 and 2 for me.
> The length of "©" will be different in utf8 or latin1 for example - it should be defined in the number of code units in the execution encoding
Yes, but you sort-of expect a length between 1-5 octets for UTF-8.
Not so for Latin-1: For one character appearing in source code,
I'd expect one length unit.
As I said, it's only a mild preference.
However, I must say I'm missing a bit of a big picture here:
What's the actual problem to be solved?
Parts of the standard were written with the assumption that 1 abstract character = 1 codepoint = 1 code unit = 1 glyph.
This is not the case so I'm trying to identify what would need tweaking :)