C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Formatting code points to character names

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Fri, 23 May 2025 08:30:42 +0100
On Fri, 23 May 2025, 06:27 Jan Schultke via Std-Proposals, <
std-proposals_at_[hidden]> wrote:

> I think it would be useful if you were able to std::format a char32_t to
> its character name. That is:
>
> char32_t c = U'\N{NO-BREAK SPACE}';
> std::string s = std::format("{?????}", c);
> // s is now "NO-BREAK SPACE"
>
> Software that deals with Unicode frequently has to print out its text
> input, possibly for the purpose of error messages, logging, and all sorts
> of things. When encountering a code point that is non-ASCII, there is a
> decent chance that it won't be displayed properly because the font is
> missing the necessary glyphs, or because the character has no visual
> representation (e.g. ZERO-WIDTH JOINER)
>
> To cover that eventuality, software often prints out the "U+NNNN"
> representation of code points, but this is very difficult for humans to
> comprehend, and unless you happen know the specific code point number (very
> few people do), you will have to look it up on the internet to comprehend
> what's going on. This is a waste of productivity.
>
> Therefore, I think the ability to print out code point names is something
> universally useful and a good fit for standardization.
>

It would add about 2MB to the on-disk and in-memory footprint of every C++
application, for something most programs will never use.

The data file is publicly available, if your application needs to translate
U+NNNN to names then it can figure out how to do that as a post-processing
step. I don't think everybody needs this functionality.


> Is this feasible? I know very little about std::format, so I'm not sure
> if one could even retroactively add such a formatting option to char32_t.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2025-05-23 07:31:05