C++ Logo

std-proposals

Advanced search

[std-proposals] Formatting code points to character names

From: Jan Schultke <janschultke_at_[hidden]>
Date: Fri, 23 May 2025 07:26:46 +0200
I think it would be useful if you were able to std::format a char32_t to
its character name. That is:

char32_t c = U'\N{NO-BREAK SPACE}';
std::string s = std::format("{?????}", c);
// s is now "NO-BREAK SPACE"

Software that deals with Unicode frequently has to print out its text
input, possibly for the purpose of error messages, logging, and all sorts
of things. When encountering a code point that is non-ASCII, there is a
decent chance that it won't be displayed properly because the font is
missing the necessary glyphs, or because the character has no visual
representation (e.g. ZERO-WIDTH JOINER)

To cover that eventuality, software often prints out the "U+NNNN"
representation of code points, but this is very difficult for humans to
comprehend, and unless you happen know the specific code point number (very
few people do), you will have to look it up on the internet to comprehend
what's going on. This is a waste of productivity.

Therefore, I think the ability to print out code point names is something
universally useful and a good fit for standardization.

Is this feasible? I know very little about std::format, so I'm not sure if
one could even retroactively add such a formatting option to char32_t.

Received on 2025-05-23 05:27:01