C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] What does Annex B mean by "character"

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 13 Jun 2024 20:50:28 +0200
On Thu, Jun 13, 2024 at 7:14 PM Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:

> Several of the implementation quantities specified in Annex B
> talk about the number of characters in a line, or an identifier.
>
> Now that we have a clearer notion of supporting UTF-8 source
> files and unicode in identifiers, do we have a clear understanding
> of what we mean by “character”.
>
> For the implementation quantities, I expect we mean code units
> in the source character set, but we might also interpret them as
> Unicode code points, which might comprise multiple code units
> in UTF-8.


Identifiers do not have a prescribed encoding so it makes more sense to
talk about code points (but of what encoding?) or keep characters (I'd say
abstract character to clarify)
For a string-literal, I'm assuming we want to assume it's evaluated and
talk about code units (because that more closely matches implementations)
For source lines, character seems correct (ie there is no prescribed
encoding before phase 1) (I'd say abstract character to clarify)

Note that there is that sort of issue in the whole standard, not just annex
B


>
> Should we bring some clearer language to bear in Annex B, and
> should we clarify our assumed understanding in each case?
>
> AlisdairM
> (On vacation in Thailand but cannot help myself)
>

Please enjoy your vacation Alisdair


> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-06-13 18:50:48