C++ Logo

sg16

Advanced search

Re: [SG16] LWG3576 - Clarifying fill character in std::format

From: Charlie Barto <Charles.Barto_at_[hidden]>
Date: Tue, 10 Aug 2021 17:44:09 +0000
Wording note: “any Unicode grapheme cluster other than { or }” may include grapheme clusters such as }̅ or similar

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Victor Zverovich via SG16
Sent: Monday, August 9, 2021 8:36 AM
To: Corentin <corentin.jabot_at_[hidden]>
Cc: Victor Zverovich <victor.zverovich_at_[hidden]>; SG16 <sg16_at_lists.isocpp.org>
Subject: Re: [SG16] LWG3576 - Clarifying fill character in std::format

As an additional data point: the {fmt} library and Python's str.format use code points.

- Victor

On Mon, Aug 9, 2021 at 8:34 AM Victor Zverovich <mailto:victor.zverovich_at_gmail.com> wrote:
Thanks Corentin for bringing this up. I think this should be at least a code point (that was the original intent which was lost to wording ambiguity), otherwise fill is pretty much useless. Grapheme cluster is an option but might be an overkill.

Cheers,
Victor

On Mon, Aug 9, 2021 at 8:30 AM Corentin <mailto:corentin.jabot_at_gmail.com> wrote:
Hello,

I wanted to bring this new LWG issue to your attention.
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3576&data=04%7C01%7CCharles.Barto%40microsoft.com%7C7bba94fb75cd4566a12008d95b4b6555%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637641201645917512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gKDkessWh8lXIy3raeThy0%2FX61USZcKgPUUumoaClig%3D&reserved=0

The author asks whether the fill character of std::format is
• a code unit
• a code point
• a grapheme cluster
This might be an abi breaking thing, and implementation disagrees already apparently.

My gut feeling is that it needs to at least be a codepoint.
I do not know if there are any concerns with allowing a grapheme in terms of implementation or performance. There is definitively some motivation, especially for non-nfc format strings.

This sort of issue illustrates my point that using the term character in the standard can be problematic!

Thanks,
Have a great week,

Corentin

Received on 2021-08-10 12:44:12