Date: Tue, 10 Aug 2021 17:39:38 +0000
All code points are complete grapheme clusters
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Peter Brett via SG16
Sent: Monday, August 9, 2021 8:37 AM
To: sg16_at_[hidden]; Victor Zverovich <victor.zverovich_at_[hidden]>
Cc: Peter Brett <pbrett_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
Hi Corentin,
Thank you very much for bringing this up!
I think that it makes logical sense to expect the 'fill character' to be a complete grapheme cluster. This makes sense - only graphemes have any defined width.
Allowing the fill character to be a codeunit would be nonsensical.
How difficult would it be to say that filling should be performed with a grapheme cluster, but filling with non-grapheme-cluster single codepoints is conditionally supported? It would permit the naïve implementation (and be backwards compatible) but would allow implementations to DTRT in the future...
Peter
From: SG16 <sg16-bounces_at_[hidden]<mailto:sg16-bounces_at_[hidden]>> On Behalf Of Corentin via SG16
Sent: 09 August 2021 16:30
To: SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>>; Victor Zverovich <victor.zverovich_at_[hidden]<mailto:victor.zverovich_at_[hidden]>>
Cc: Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>
Subject: [SG16] LWG3576 - Clarifying fill character in std::format
EXTERNAL MAIL
Hello,
I wanted to bring this new LWG issue to your attention.
https://cplusplus.github.io/LWG/issue3576<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fcplusplus.github.io%2FLWG%2Fissue3576__%3B!!EHscmS1ygiU1lA!TrwCB_t-9nAWgDI5gnEC950v1I_yKFTypiXq-sgAuUBAOaMyOqlOx0BZAM4xmg%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C0e07ecd9b7b0415e467f08d95b4ba4cc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637641203163539018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DJA4DmnKAIi%2B3OhNu6jX5iszt7cz%2F0bstjyDdePZzSM%3D&reserved=0>
The author asks whether the fill character of std::format is
* a code unit
* a code point
* a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Peter Brett via SG16
Sent: Monday, August 9, 2021 8:37 AM
To: sg16_at_[hidden]; Victor Zverovich <victor.zverovich_at_[hidden]>
Cc: Peter Brett <pbrett_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
Hi Corentin,
Thank you very much for bringing this up!
I think that it makes logical sense to expect the 'fill character' to be a complete grapheme cluster. This makes sense - only graphemes have any defined width.
Allowing the fill character to be a codeunit would be nonsensical.
How difficult would it be to say that filling should be performed with a grapheme cluster, but filling with non-grapheme-cluster single codepoints is conditionally supported? It would permit the naïve implementation (and be backwards compatible) but would allow implementations to DTRT in the future...
Peter
From: SG16 <sg16-bounces_at_[hidden]<mailto:sg16-bounces_at_[hidden]>> On Behalf Of Corentin via SG16
Sent: 09 August 2021 16:30
To: SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>>; Victor Zverovich <victor.zverovich_at_[hidden]<mailto:victor.zverovich_at_[hidden]>>
Cc: Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>
Subject: [SG16] LWG3576 - Clarifying fill character in std::format
EXTERNAL MAIL
Hello,
I wanted to bring this new LWG issue to your attention.
https://cplusplus.github.io/LWG/issue3576<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fcplusplus.github.io%2FLWG%2Fissue3576__%3B!!EHscmS1ygiU1lA!TrwCB_t-9nAWgDI5gnEC950v1I_yKFTypiXq-sgAuUBAOaMyOqlOx0BZAM4xmg%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C0e07ecd9b7b0415e467f08d95b4ba4cc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637641203163539018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DJA4DmnKAIi%2B3OhNu6jX5iszt7cz%2F0bstjyDdePZzSM%3D&reserved=0>
The author asks whether the fill character of std::format is
* a code unit
* a code point
* a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
Received on 2021-08-10 12:39:42