On 3/30/25 6:39 AM, Tiago Freire via Std-Proposals wrote:

I know that this is something that looks like a good idea, but isn't, once you understand the underlying details.

 

This would not make for portable code, it would at most only work on Windows and even at that not universally.

That depends on how the feature is defined. There is reasonable behavior that can be defined that is portable and useful; see below.

I’m assuming that what you are trying to use is the (W)ide versions of Windows API’s such as “WriteConsoleW”.

I would guess otherwise since WriteConsoleW() doesn't work with redirected output.

 

First things first this would never work on Linux because the only means of output is an 8bit per character interface, and wchar_t can be 32bit per character (or 16bit depending on compiler options), so you would need to convert a text sequence from wchar_t to a char type.

What is the encoding of text in “wchar_t” sequence?

The encoding of “char” is not even consistent other than “system encoding” (whatever that means), “wchar_t” is definitely not defined. Good luck sorting that one out!

Transcoding is required. The question is from which encoding to which encoding. There are reasonable options for each.

Formally, the encoding of wide text is locale dependent according to the C++ standard. In practice, I'm only aware of two platforms that take advantage of locale dependence for data stored in wchar_t. Per AIX 7.2 documentation, Unicode (presumed to be UCS-2 for 16-bit wchar_t and UTF-32 for 32-bit wchar_t) is used for all locales except for zh_TW which uses IBM-eucTW. The other platform is z/OS and it uses DBCS variants of EBCDIC for wchar_t. All other platforms use UCS-2/UTF-16/UTF-32 depending on the size of wchar_t.

I've yet to hear of a platform that has file streams that are not char-based. Even Windows, despite its predilection for wchar_t, uses (8-bit) char streams for all I/O with the exception of direct writes and reads to and from the console by applications that have an attached console. It is straight forward to check if a file handle corresponds to a console on Windows. std::print() implementations perform such checks today and, if a handle does correspond to a console, call WriteConsoleW() directly to bypass the (usually incorrect) console encoding so that characters are rendered correctly. A std::wprint() implementation can do likewise.

That leaves just the question of what encoding to transcode to for non-Windows platforms or for when the output file handle does not correspond to a console on Windows. The C++ standard recently introduced the concept of an "environment" encoding as exposed by std::text_encoding::environment(). Per [text.encoding.members]p14, there is a portable description of what this encoding corresponds to for POSIX systems. The standard doesn't say anything specifically for Windows (perhaps it should), but the environment encoding there would be the Active Code Page (ACP) encoding. The ACP isn't what is always desired, but it is the right default as it is most likely the encoding that the next process in a command pipeline will use to interpret piped output. For applications that are intended to always and only produce UTF-8, a manifest can be used to set the ACP to UTF-8 at program startup (see Microsoft documentation). Other situations will require the application to explicitly transcode as needed.

 

You would think that the encoding of wchar_t would at least be more defined on Windows, surely it must be UTF-16 right? Wrong! It’s not! It is not only not consistent from system to system, it is not consistent with application running in the same machine, even worse; an application could change at runtime what “wchar_t” means when you pass it to “WriteConsoleW”, it’s a feature, please see “SetConsoleCP”.

SetConsoleCP() doesn't affect the encoding of wchar_t; it affects what encoding is used by the console to convert char-based console input to wchar_t. Data in wchar_t -based storage can be assumed to be encoded in UTF-16 everywhere on Windows (though it cannot be assumed that all such data is validly encoded; lone surrogates and out of order surrogates are always a possibility).

 

Now, why is it named “Write”-“Console” and not just “Write”-“Out”? And how is it possible that “WriteConsoleW” is able to pass 16bit per character streams when cout is definitely 8bit per character stream?

Because they are actually 2 separate streams bundled in the same output interface! (cout)

“WriteConsoleW” only works when the parent process has created a special “Console” buffer output (which a console does) which has the twin streams, and when you use “WriteConsoleW” it asks to use the alternative 16bit stream bundled on the output channel.

If you tried to use a pipe (for example to write out to a file) and the application “WriteConsoleW” is used, “WriteConsoleW” will fail (because the twin 16bit stream doesn’t exist) and none of those messages will be seen.

I would quibble with the way this is described above (there aren't two streams, there is a handle which might be associated with a stream or might be associated with a console), but this distinction is just not particularly important.

 

“WriteConsoleW” is only meant to work if an application is attached to a console (or a console like application, that specifically provides console like features).

It is not meant for general use applications.

 

This is a mess. To much operating system specific behavior. You don’t want to touch it.

I disagree with this characterization. While there is some messiness, it isn't as bad as has been portrayed. What has historically been the most troublesome is 1) the continued use of legacy DOS code pages as the default console encoding (for backward compatibility) and its deviation from the ACP, and 2) the lack of support for UTF-8 as the ACP; something that has improved, but unfortunately, is still not on track to become the default.

Adding std::wprint() is quite doable; most of the hard work was already done for std::print(). There is no real technical challenge.

I think some programmers would likely benefit from a standardized std::wprint(). I don't think the numbers are large though, so I wouldn't consider this a high priority addition relevant to other things WG21 could be working on. But I also don't think this would require a lot of WG21 time; most of the effort would happen in SG16.

Tom.

 

 

 


From: Std-Proposals <std-proposals-bounces@lists.isocpp.org> on behalf of Tymi via Std-Proposals <std-proposals@lists.isocpp.org>
Sent: Sunday, March 30, 2025 11:30:31 AM
To: std-proposals@lists.isocpp.org <std-proposals@lists.isocpp.org>
Cc: Tymi <tymi.cpp@gmail.com>
Subject: Re: [std-proposals] std::wprint/std::wprintln

 

I mean, I am up to making std::print just accept std::wformat_ string but I am aware many members would not like that...

 

On Sun, Mar 30, 2025 at 11:19AM Tymi <tymi.cpp@gmail.com> wrote:

wide string support for std::print/std::println would be neat. I need it in my projects.

I can write a proposal & possible implementation if enough people support this!

 

tymi.