C++ Logo

sg16

Advanced search

Re: [SG16] More notes on P2286R3 - Formatting Ranges

From: Charlie Barto <Charles.Barto_at_[hidden]>
Date: Tue, 7 Dec 2021 23:22:06 +0000
Why does the debug printing stuff need to be a part of formatting ranges anyways? It seems quite separate to me.

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Corentin via SG16
Sent: Thursday, December 2, 2021 6:44 AM
To: SG16 <sg16_at_[hidden]>
Cc: Corentin <corentin.jabot_at_[hidden]>
Subject: [SG16] More notes on P2286R3 - Formatting Ranges

Hello,
I wanted to add some information about the paper we discussed yesterday.

I do not understand the motivation for being able to copy the output of fmt's "debug" back to a string literal and expect a consistent result. I think I'd like to see that explored more in the paper.

Unicode defines graphic characters as characters of the categories L, M, N, P, S, Zs (unicode 14, chapter 2.4)
Note that new unicode versions can assign codepoints and make them graphic, so the output isn't stable from one version to the next.
Graphic excludes control, and formatting codepoints, but not all graphic characters are visible.
Go lang defines an additional property "Printable" which is like graphics but excludes spaces other than SPACE.

The paper needs to decide what it considers for escaping.
Both of these properties are reasonably compact tables.
"printable" as defined by golang is probably a good default.

--
For non-unicode, I wonder if escaping everything but basic latin1 would be reasonable.
Other solutions include converting to unicode first, or pull-in std::isprint, which ties in locale, and only works for stateless, single code units. Neither of these work if an encoding is assumed incorrectly.
But that's the case if the encoding is unicode too.
Should the debug specifier have a text/binary mode?
Or do we assume that string-like things are always text, and if we want to output bytes, maybe std::as_bytes(std::span{some_string}); is the way to do it?
(My underlying assumption is that printing a string (sequence of bytes) to a terminal implies an encoding, while dumping/debugging it, etc doesn't)

Anyway, I think that all of this might be interesting for the authors to consider.

Thanks,
Corentin




Received on 2021-12-07 17:22:09