Date: Tue, 10 Aug 2021 17:44:09 +0000
Wording note: “any Unicode grapheme cluster other than { or }” may include grapheme clusters such as }̅ or similar
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Victor Zverovich via SG16
Sent: Monday, August 9, 2021 8:36 AM
To: Corentin <corentin.jabot_at_[hidden]>
Cc: Victor Zverovich <victor.zverovich_at_[hidden]>; SG16 <sg16_at_lists.isocpp.org>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
As an additional data point: the {fmt} library and Python's str.format use code points.
- Victor
On Mon, Aug 9, 2021 at 8:34 AM Victor Zverovich <mailto:victor.zverovich_at_gmail.com> wrote:
Thanks Corentin for bringing this up. I think this should be at least a code point (that was the original intent which was lost to wording ambiguity), otherwise fill is pretty much useless. Grapheme cluster is an option but might be an overkill.
Cheers,
Victor
On Mon, Aug 9, 2021 at 8:30 AM Corentin <mailto:corentin.jabot_at_gmail.com> wrote:
Hello,
I wanted to bring this new LWG issue to your attention.
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3576&data=04%7C01%7CCharles.Barto%40microsoft.com%7C7bba94fb75cd4566a12008d95b4b6555%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637641201645917512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gKDkessWh8lXIy3raeThy0%2FX61USZcKgPUUumoaClig%3D&reserved=0
The author asks whether the fill character of std::format is
• a code unit
• a code point
• a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Victor Zverovich via SG16
Sent: Monday, August 9, 2021 8:36 AM
To: Corentin <corentin.jabot_at_[hidden]>
Cc: Victor Zverovich <victor.zverovich_at_[hidden]>; SG16 <sg16_at_lists.isocpp.org>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format
As an additional data point: the {fmt} library and Python's str.format use code points.
- Victor
On Mon, Aug 9, 2021 at 8:34 AM Victor Zverovich <mailto:victor.zverovich_at_gmail.com> wrote:
Thanks Corentin for bringing this up. I think this should be at least a code point (that was the original intent which was lost to wording ambiguity), otherwise fill is pretty much useless. Grapheme cluster is an option but might be an overkill.
Cheers,
Victor
On Mon, Aug 9, 2021 at 8:30 AM Corentin <mailto:corentin.jabot_at_gmail.com> wrote:
Hello,
I wanted to bring this new LWG issue to your attention.
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3576&data=04%7C01%7CCharles.Barto%40microsoft.com%7C7bba94fb75cd4566a12008d95b4b6555%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637641201645917512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gKDkessWh8lXIy3raeThy0%2FX61USZcKgPUUumoaClig%3D&reserved=0
The author asks whether the fill character of std::format is
• a code unit
• a code point
• a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.
My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.
This sort of issue illustrates my point that using the term character in the standard can be problematic!
Thanks,
Have a great week,
Corentin
Received on 2021-08-10 12:44:12