Date: Tue, 10 Aug 2021 17:59:58 +0000
> How difficult would it be to say that filling should be performed with a grapheme cluster, but filling with non-grapheme-cluster single codepoints is conditionally supported? It would permit the naïve implementation (and be backwards compatible) but would allow implementations to DTRT in the future.
do you mean the opposite of this? Rather that filling should support at least a codepoint but may support a grapheme cluster.
It occurs to me that supporting a whole grapheme cluster might require dynamic allocation, (you can make an grapheme cluster as long as you like using emoji's and Extend characters).
Personally I'd be fine with just specifying that it's one Unicode code point and leaving it at that. This would disallow some of the "single code unit" behavior that "used to work" but that's only thing like padding text with unpaired surrogates, which is dubious at best.
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Charlie Barto via SG16
Sent: Tuesday, August 10, 2021 10:40 AM
To: sg16_at_[hidden]; Victor Zverovich <victor.zverovich_at_[hidden]>
Cc: Charlie Barto <Charles.Barto_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
All code points are complete grapheme clusters
From: SG16 <mailto:sg16-bounces_at_[hidden]> On Behalf Of Peter Brett via SG16
Sent: Monday, August 9, 2021 8:37 AM
To: mailto:sg16_at_[hidden]; Victor Zverovich <mailto:victor.zverovich_at_[hidden]>
Cc: Peter Brett <mailto:pbrett_at_[hidden]>; Corentin <mailto:corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
Hi Corentin,
Thank you very much for bringing this up!
I think that it makes logical sense to expect the 'fill character' to be a complete grapheme cluster. This makes sense - only graphemes have any defined width.
Allowing the fill character to be a codeunit would be nonsensical.
How difficult would it be to say that filling should be performed with a grapheme cluster, but filling with non-grapheme-cluster single codepoints is conditionally supported? It would permit the naïve implementation (and be backwards compatible) but would allow implementations to DTRT in the future.
Peter
From: SG16 <mailto:sg16-bounces_at_[hidden]> On Behalf Of Corentin via SG16
Sent: 09 August 2021 16:30
To: SG16 <mailto:sg16_at_[hidden]>; Victor Zverovich <mailto:victor.zverovich_at_[hidden]>
Cc: Corentin <mailto:corentin.jabot_at_[hidden]>
Subject: [SG16] LWG3576 - Clarifying fill character in std::format
EXTERNAL MAIL
Hello,
I wanted to bring this new LWG issue to your attention.
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fcplusplus.github.io%2FLWG%2Fissue3576__%3B!!EHscmS1ygiU1lA!TrwCB_t-9nAWgDI5gnEC950v1I_yKFTypiXq-sgAuUBAOaMyOqlOx0BZAM4xmg%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C986d0b5592e24017a9a808d95c25e00d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637642140019094521%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NQBXNmmL0%2Bvf1ZEDwetGy3lB5g0fxrVHojpIqoEhd64%3D&reserved=0
The author asks whether the fill character of std::format is
. a code unit
. a code point
. a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
do you mean the opposite of this? Rather that filling should support at least a codepoint but may support a grapheme cluster.
It occurs to me that supporting a whole grapheme cluster might require dynamic allocation, (you can make an grapheme cluster as long as you like using emoji's and Extend characters).
Personally I'd be fine with just specifying that it's one Unicode code point and leaving it at that. This would disallow some of the "single code unit" behavior that "used to work" but that's only thing like padding text with unpaired surrogates, which is dubious at best.
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Charlie Barto via SG16
Sent: Tuesday, August 10, 2021 10:40 AM
To: sg16_at_[hidden]; Victor Zverovich <victor.zverovich_at_[hidden]>
Cc: Charlie Barto <Charles.Barto_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
All code points are complete grapheme clusters
From: SG16 <mailto:sg16-bounces_at_[hidden]> On Behalf Of Peter Brett via SG16
Sent: Monday, August 9, 2021 8:37 AM
To: mailto:sg16_at_[hidden]; Victor Zverovich <mailto:victor.zverovich_at_[hidden]>
Cc: Peter Brett <mailto:pbrett_at_[hidden]>; Corentin <mailto:corentin.jabot_at_[hidden]>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
Hi Corentin,
Thank you very much for bringing this up!
I think that it makes logical sense to expect the 'fill character' to be a complete grapheme cluster. This makes sense - only graphemes have any defined width.
Allowing the fill character to be a codeunit would be nonsensical.
How difficult would it be to say that filling should be performed with a grapheme cluster, but filling with non-grapheme-cluster single codepoints is conditionally supported? It would permit the naïve implementation (and be backwards compatible) but would allow implementations to DTRT in the future.
Peter
From: SG16 <mailto:sg16-bounces_at_[hidden]> On Behalf Of Corentin via SG16
Sent: 09 August 2021 16:30
To: SG16 <mailto:sg16_at_[hidden]>; Victor Zverovich <mailto:victor.zverovich_at_[hidden]>
Cc: Corentin <mailto:corentin.jabot_at_[hidden]>
Subject: [SG16] LWG3576 - Clarifying fill character in std::format
EXTERNAL MAIL
Hello,
I wanted to bring this new LWG issue to your attention.
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fcplusplus.github.io%2FLWG%2Fissue3576__%3B!!EHscmS1ygiU1lA!TrwCB_t-9nAWgDI5gnEC950v1I_yKFTypiXq-sgAuUBAOaMyOqlOx0BZAM4xmg%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C986d0b5592e24017a9a808d95c25e00d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637642140019094521%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NQBXNmmL0%2Bvf1ZEDwetGy3lB5g0fxrVHojpIqoEhd64%3D&reserved=0
The author asks whether the fill character of std::format is
. a code unit
. a code point
. a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
Received on 2021-08-10 13:00:18