> If you have input that would allow the standard to enable developers to output Unicode cleanly on Windows, I'd be very happy to add anything that's required for it to the standard.
Yes, I have an input on this. And to answer your question “how
would allow the standard to enable developers to output Unicode cleanly on Windows?”
The answer is you don’t! Or better yet, you don’t on any platform either they be Windows or Linux or other.
When you write using the standard output (which std::print does), there’s an implicit contract that there is a form of IPC and an API which you can call in order to transfer your bytes from your
user application to a console application on the other end of that IPC.
You can only copy bytes to this IPC and there’s no explicit agreement that the application on the other end will interpret that as Unicode. Even if you tried to standardize this in C++, the application
on the other side is often not written in C++ anyways, and it can just causally ignore whatever rule you set forth.
The application that decides if it’s Unicode or not is the console application, not the user’s application.
Right now, things only work because most consoles implicitly agree to print out ASCII for codepoints up to 7bits (as a default, not always true everywhere), 8bit codepoints is whatever there’s
no consistency there, and more recently there is a shift to start interpreting thing as UTF-8 (but nothing is guaranteed, on any platform including unix based).
If a user wants to use things like “std::vprint_unicode” which has been explicitly designed to operate with WriteConsoleW, they already need to explicitly acknowledge that their application is
being explicitly written targeting a Windows platform, and that point they might as well just use WriteConsoleW themselves.
Note that this only works if the console application is the Windows terminal, or if it uses the Console Buffer IPC, otherwise WriteConsoleW does nothing.
Ok so this is what you should do:
The user can then check whatever system they have, control whatever code points their console support, and if it all checks out, then they can
Step 1. format in UTF-8
Step 2. Send it to the output buffer
(or format into UTF16 and send it to WriteConsoleW, it is up to them)
I have been modeling Unicode in my own console applications for a long time now, in an extremely easy way, UTF-8, UTF-16, UTF-32, all of them in the same stream at the same time and do it all
on the stack, easy peasy, because of 1 thing.
My formatting is not attached to my output, that is what’s killing it.
If you want I can show you some ideas regarding how I have been handling that in my library “https://github.com/tmiguelf/logger”.