Date: Fri, 23 May 2025 11:19:49 -0400
On Fri, May 23, 2025 at 1:27 AM Jan Schultke via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> I think it would be useful if you were able to std::format a char32_t to its character name. That is:
>
> char32_t c = U'\N{NO-BREAK SPACE}';
> std::string s = std::format("{?????}", c);
> // s is now "NO-BREAK SPACE"
>
> Software that deals with Unicode frequently has to print out its text input, possibly for the purpose of error messages, logging, and all sorts of things. When encountering a code point that is non-ASCII, there is a decent chance that it won't be displayed properly because the font is missing the necessary glyphs, or because the character has no visual representation (e.g. ZERO-WIDTH JOINER)
>
> To cover that eventuality, software often prints out the "U+NNNN" representation of code points, but this is very difficult for humans to comprehend, and unless you happen know the specific code point number (very few people do), you will have to look it up on the internet to comprehend what's going on. This is a waste of productivity.
>
> Therefore, I think the ability to print out code point names is something universally useful and a good fit for standardization.
>
> Is this feasible? I know very little about std::format, so I'm not sure if one could even retroactively add such a formatting option to char32_t.
I don't think this functionality is appropriate for `std::format`. The
ability to convert a Unicode codepoint into its textual form is a
reasonable piece of functionality for Unicode processing. But I don't
think it needs to be bound to `std::format`; it should just be a query
that takes a `char32_t` and returns a string of some kind in an
encoding of your preference.
Speaking of which, are non-ASCII characters allowed in those names? If
so, then the output encoding really needs to be something the user can
specify.
<std-proposals_at_[hidden]> wrote:
>
> I think it would be useful if you were able to std::format a char32_t to its character name. That is:
>
> char32_t c = U'\N{NO-BREAK SPACE}';
> std::string s = std::format("{?????}", c);
> // s is now "NO-BREAK SPACE"
>
> Software that deals with Unicode frequently has to print out its text input, possibly for the purpose of error messages, logging, and all sorts of things. When encountering a code point that is non-ASCII, there is a decent chance that it won't be displayed properly because the font is missing the necessary glyphs, or because the character has no visual representation (e.g. ZERO-WIDTH JOINER)
>
> To cover that eventuality, software often prints out the "U+NNNN" representation of code points, but this is very difficult for humans to comprehend, and unless you happen know the specific code point number (very few people do), you will have to look it up on the internet to comprehend what's going on. This is a waste of productivity.
>
> Therefore, I think the ability to print out code point names is something universally useful and a good fit for standardization.
>
> Is this feasible? I know very little about std::format, so I'm not sure if one could even retroactively add such a formatting option to char32_t.
I don't think this functionality is appropriate for `std::format`. The
ability to convert a Unicode codepoint into its textual form is a
reasonable piece of functionality for Unicode processing. But I don't
think it needs to be bound to `std::format`; it should just be a query
that takes a `char32_t` and returns a string of some kind in an
encoding of your preference.
Speaking of which, are non-ASCII characters allowed in those names? If
so, then the output encoding really needs to be something the user can
specify.
Received on 2025-05-23 15:20:04