C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Formatting code points to character names

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Fri, 23 May 2025 16:27:43 +0100
On Fri, 23 May 2025 at 16:20, Jason McKesson via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> On Fri, May 23, 2025 at 1:27 AM Jan Schultke via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> >
> > I think it would be useful if you were able to std::format a char32_t to its character name. That is:
> >
> > char32_t c = U'\N{NO-BREAK SPACE}';
> > std::string s = std::format("{?????}", c);
> > // s is now "NO-BREAK SPACE"
> >
> > Software that deals with Unicode frequently has to print out its text input, possibly for the purpose of error messages, logging, and all sorts of things. When encountering a code point that is non-ASCII, there is a decent chance that it won't be displayed properly because the font is missing the necessary glyphs, or because the character has no visual representation (e.g. ZERO-WIDTH JOINER)
> >
> > To cover that eventuality, software often prints out the "U+NNNN" representation of code points, but this is very difficult for humans to comprehend, and unless you happen know the specific code point number (very few people do), you will have to look it up on the internet to comprehend what's going on. This is a waste of productivity.
> >
> > Therefore, I think the ability to print out code point names is something universally useful and a good fit for standardization.
> >
> > Is this feasible? I know very little about std::format, so I'm not sure if one could even retroactively add such a formatting option to char32_t.
>
> I don't think this functionality is appropriate for `std::format`. The
> ability to convert a Unicode codepoint into its textual form is a
> reasonable piece of functionality for Unicode processing. But I don't
> think it needs to be bound to `std::format`; it should just be a query
> that takes a `char32_t` and returns a string of some kind in an
> encoding of your preference.

Right, which could then be used as a transformation for each character
of a utf view.

> Speaking of which, are non-ASCII characters allowed in those names? If
> so, then the output encoding really needs to be something the user can
> specify.

No: https://www.unicode.org/reports/tr34/#Names

Received on 2025-05-23 15:28:03