Date: Thu, 2 Dec 2021 15:43:59 +0100
Hello,
I wanted to add some information about the paper we discussed yesterday.
I do not understand the motivation for being able to copy the output of
fmt's "debug" back to a string literal and expect a consistent result. I
think I'd like to see that explored more in the paper.
Unicode defines graphic characters as characters of the categories L, M, N,
P, S, Zs (unicode 14, chapter 2.4)
Note that new unicode versions can assign codepoints and make them graphic,
so the output isn't stable from one version to the next.
Graphic excludes control, and formatting codepoints, but not all graphic
characters are visible.
Go lang defines an additional property "Printable" which is like graphics
but excludes spaces other than SPACE.
The paper needs to decide what it considers for escaping.
Both of these properties are reasonably compact tables.
"printable" as defined by golang is probably a good default.
I wanted to add some information about the paper we discussed yesterday.
I do not understand the motivation for being able to copy the output of
fmt's "debug" back to a string literal and expect a consistent result. I
think I'd like to see that explored more in the paper.
Unicode defines graphic characters as characters of the categories L, M, N,
P, S, Zs (unicode 14, chapter 2.4)
Note that new unicode versions can assign codepoints and make them graphic,
so the output isn't stable from one version to the next.
Graphic excludes control, and formatting codepoints, but not all graphic
characters are visible.
Go lang defines an additional property "Printable" which is like graphics
but excludes spaces other than SPACE.
The paper needs to decide what it considers for escaping.
Both of these properties are reasonably compact tables.
"printable" as defined by golang is probably a good default.
-- For non-unicode, I wonder if escaping everything but basic latin1 would be reasonable. Other solutions include converting to unicode first, or pull-in std::isprint, which ties in locale, and only works for stateless, single code units. Neither of these work if an encoding is assumed incorrectly. But that's the case if the encoding is unicode too. Should the debug specifier have a text/binary mode? Or do we assume that string-like things are always text, and if we want to output bytes, maybe std::as_bytes(std::span{some_string}); is the way to do it? (My underlying assumption is that printing a string (sequence of bytes) to a terminal implies an encoding, while dumping/debugging it, etc doesn't) Anyway, I think that all of this might be interesting for the authors to consider. Thanks, Corentin
Received on 2021-12-02 08:44:18