C++ Logo

sg16

Advanced search

Re: Agenda for the 2024-02-21 SG16 meeting

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Tue, 20 Feb 2024 11:01:20 +0000
> If you have input that would allow the standard to enable developers to output Unicode cleanly on Windows, I'd be very happy to add anything that's required for it to the standard.

Yes, I have an input on this. And to answer your question “how would allow the standard to enable developers to output Unicode cleanly on Windows?”
The answer is you don’t! Or better yet, you don’t on any platform either they be Windows or Linux or other.
When you write using the standard output (which std::print does), there’s an implicit contract that there is a form of IPC and an API which you can call in order to transfer your bytes from your user application to a console application on the other end of that IPC.

You can only copy bytes to this IPC and there’s no explicit agreement that the application on the other end will interpret that as Unicode. Even if you tried to standardize this in C++, the application on the other side is often not written in C++ anyways, and it can just causally ignore whatever rule you set forth.
The application that decides if it’s Unicode or not is the console application, not the user’s application.

Right now, things only work because most consoles implicitly agree to print out ASCII for codepoints up to 7bits (as a default, not always true everywhere), 8bit codepoints is whatever there’s no consistency there, and more recently there is a shift to start interpreting thing as UTF-8 (but nothing is guaranteed, on any platform including unix based).

If a user wants to use things like “std::vprint_unicode” which has been explicitly designed to operate with WriteConsoleW, they already need to explicitly acknowledge that their application is being explicitly written targeting a Windows platform, and that point they might as well just use WriteConsoleW themselves.
Note that this only works if the console application is the Windows terminal, or if it uses the Console Buffer IPC, otherwise WriteConsoleW does nothing.

Ok so this is what you should do:

  1. Remove any function specific to Windows. Windows support the standard 8bit stream just fine, and you can output Unicode using it. If I have to write an exceptional ifdef, then let users write their own WriteConsoleW. Don’t burden the language with the peculiarities of 1 specific OS when it is not needed.
  2. Acknowledge that the output stream is not Unicode (or UTF-8)
  3. Remove formatting from the interface. It’s a violation of single responsibility. It tries to do both formatting and outputting at the same time and it can’t do either good. You need an API that just writes a sequence of bytes to the output stream and that’s it. 2 input parameters, one is a pointer, the other is the size of the buffer.
  4. Give users the ability to format in whatever encoding they want. Give them a function that allows converting between UTF8, UTF16, ascii, whatever, make it easier to write a valid Unicode buffer.

The user can then check whatever system they have, control whatever code points their console support, and if it all checks out, then they can
Step 1. format in UTF-8
Step 2. Send it to the output buffer

(or format into UTF16 and send it to WriteConsoleW, it is up to them)


I have been modeling Unicode in my own console applications for a long time now, in an extremely easy way, UTF-8, UTF-16, UTF-32, all of them in the same stream at the same time and do it all on the stack, easy peasy, because of 1 thing.
My formatting is not attached to my output, that is what’s killing it.
If you want I can show you some ideas regarding how I have been handling that in my library “https://github.com/tmiguelf/logger”.

Received on 2024-02-20 11:01:25