Date: Tue, 1 Apr 2025 00:54:04 -0400
On 3/30/25 6:39 AM, Tiago Freire via Std-Proposals wrote:
>
> I know that this is something that looks like a good idea, but isn't,
> once you understand the underlying details.
>
> This would not make for portable code, it would at most only work on
> Windows and even at that not universally.
>
That depends on how the feature is defined. There is reasonable behavior
that can be defined that is portable and useful; see below.
>
> I’m assuming that what you are trying to use is the (W)ide versions of
> Windows API’s such as “WriteConsoleW”.
>
I would guess otherwise since WriteConsoleW() doesn't work with
redirected output.
>
> First things first this would never work on Linux because the only
> means of output is an 8bit per character interface, and wchar_t can be
> 32bit per character (or 16bit depending on compiler options), so you
> would need to convert a text sequence from wchar_t to a char type.
>
> What is the encoding of text in “wchar_t” sequence?
>
> The encoding of “char” is not even consistent other than “system
> encoding” (whatever that means), “wchar_t” is definitely not defined.
> Good luck sorting that one out!
>
Transcoding is required. The question is from which encoding to which
encoding. There are reasonable options for each.
Formally, the encoding of wide text is locale dependent according to the
C++ standard. In practice, I'm only aware of two platforms that take
advantage of locale dependence for data stored in wchar_t. Per AIX 7.2
documentation
<https://www.ibm.com/docs/hr/aix/7.2?topic=representation-wide-character-data>,
Unicode (presumed to be UCS-2 for 16-bit wchar_t and UTF-32 for 32-bit
wchar_t) is used for all locales except for zh_TW which uses IBM-eucTW
<https://www.ibm.com/docs/sr/aix/7.2?topic=sets-euctw>. The other
platform is z/OS and it uses DBCS variants of EBCDIC for wchar_t. All
other platforms use UCS-2/UTF-16/UTF-32 depending on the size of wchar_t.
I've yet to hear of a platform that has file streams that are not
char-based. Even Windows, despite its predilection for wchar_t, uses
(8-bit) char streams for all I/O with the exception of direct writes and
reads to and from the console by applications that have an attached
console. It is straight forward to check if a file handle corresponds to
a console on Windows. std::print() implementations perform such checks
today and, if a handle does correspond to a console, call
WriteConsoleW() directly to bypass the (usually incorrect) console
encoding so that characters are rendered correctly. A std::wprint()
implementation can do likewise.
That leaves just the question of what encoding to transcode to for
non-Windows platforms or for when the output file handle does not
correspond to a console on Windows. The C++ standard recently introduced
the concept of an "environment" encoding as exposed by
std::text_encoding::environment(). Per [text.encoding.members]p14
<https://eel.is/c++draft/text.encoding.members#14>, there is a portable
description of what this encoding corresponds to for POSIX systems. The
standard doesn't say anything specifically for Windows (perhaps it
should), but the environment encoding there would be the Active Code
Page (ACP) encoding. The ACP isn't what is always desired, but it is the
right default as it is most likely the encoding that the next process in
a command pipeline will use to interpret piped output. For applications
that are intended to always and only produce UTF-8, a manifest can be
used to set the ACP to UTF-8 at program startup (see Microsoft
documentation
<https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage>).
Other situations will require the application to explicitly transcode as
needed.
> You would think that the encoding of wchar_t would at least be more
> defined on Windows, surely it must be UTF-16 right? Wrong! It’s not!
> It is not only not consistent from system to system, it is not
> consistent with application running in the same machine, even worse;
> an application could change at runtime what “wchar_t” means when you
> pass it to “WriteConsoleW”, it’s a feature, please see “SetConsoleCP”.
>
SetConsoleCP() doesn't affect the encoding of wchar_t; it affects what
encoding is used by the console to convert char-based console input to
wchar_t. Data in wchar_t -based storage can be assumed to be encoded in
UTF-16 everywhere on Windows (though it cannot be assumed that all such
data is validly encoded; lone surrogates and out of order surrogates are
always a possibility).
>
> Now, why is it named “Write”-“Console” and not just “Write”-“Out”? And
> how is it possible that “WriteConsoleW” is able to pass 16bit per
> character streams when cout is definitely 8bit per character stream?
>
> Because they are actually 2 separate streams bundled in the same
> output interface! (cout)
>
> “WriteConsoleW” only works when the parent process has created a
> special “Console” buffer output (which a console does) which has the
> twin streams, and when you use “WriteConsoleW” it asks to use the
> alternative 16bit stream bundled on the output channel.
>
> If you tried to use a pipe (for example to write out to a file) and
> the application “WriteConsoleW” is used, “WriteConsoleW” will fail
> (because the twin 16bit stream doesn’t exist) and none of those
> messages will be seen.
>
I would quibble with the way this is described above (there aren't two
streams, there is a handle which might be associated with a stream or
might be associated with a console), but this distinction is just not
particularly important.
>
> “WriteConsoleW” is only meant to work if an application is attached to
> a console (or a console like application, that specifically provides
> console like features).
>
> It is not meant for general use applications.
>
> This is a mess. To much operating system specific behavior. You don’t
> want to touch it.
>
I disagree with this characterization. While there is some messiness, it
isn't as bad as has been portrayed. What has historically been the most
troublesome is 1) the continued use of legacy DOS code pages as the
default console encoding (for backward compatibility) and its deviation
from the ACP, and 2) the lack of support for UTF-8 as the ACP; something
that has improved, but unfortunately, is still not on track to become
the default.
Adding std::wprint() is quite doable; most of the hard work was already
done for std::print(). There is no real technical challenge.
I think some programmers would likely benefit from a standardized
std::wprint(). I don't think the numbers are large though, so I wouldn't
consider this a high priority addition relevant to other things WG21
could be working on. But I also don't think this would require a lot of
WG21 time; most of the effort would happen in SG16.
Tom.
> ------------------------------------------------------------------------
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
> behalf of Tymi via Std-Proposals <std-proposals_at_[hidden]>
> *Sent:* Sunday, March 30, 2025 11:30:31 AM
> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
> *Cc:* Tymi <tymi.cpp_at_[hidden]>
> *Subject:* Re: [std-proposals] std::wprint/std::wprintln
>
> I mean, I am up to making std::print just accept std::wformat_ string
> but I am aware many members would not like that...
>
> On Sun, Mar 30, 2025 at 11:19 AM Tymi <tymi.cpp_at_[hidden]> wrote:
>
> wide string support for std::print/std::println would be neat. I
> need it in my projects.
>
> I can write a proposal & possible implementation if enough people
> support this!
>
> tymi.
>
>
>
> I know that this is something that looks like a good idea, but isn't,
> once you understand the underlying details.
>
> This would not make for portable code, it would at most only work on
> Windows and even at that not universally.
>
That depends on how the feature is defined. There is reasonable behavior
that can be defined that is portable and useful; see below.
>
> I’m assuming that what you are trying to use is the (W)ide versions of
> Windows API’s such as “WriteConsoleW”.
>
I would guess otherwise since WriteConsoleW() doesn't work with
redirected output.
>
> First things first this would never work on Linux because the only
> means of output is an 8bit per character interface, and wchar_t can be
> 32bit per character (or 16bit depending on compiler options), so you
> would need to convert a text sequence from wchar_t to a char type.
>
> What is the encoding of text in “wchar_t” sequence?
>
> The encoding of “char” is not even consistent other than “system
> encoding” (whatever that means), “wchar_t” is definitely not defined.
> Good luck sorting that one out!
>
Transcoding is required. The question is from which encoding to which
encoding. There are reasonable options for each.
Formally, the encoding of wide text is locale dependent according to the
C++ standard. In practice, I'm only aware of two platforms that take
advantage of locale dependence for data stored in wchar_t. Per AIX 7.2
documentation
<https://www.ibm.com/docs/hr/aix/7.2?topic=representation-wide-character-data>,
Unicode (presumed to be UCS-2 for 16-bit wchar_t and UTF-32 for 32-bit
wchar_t) is used for all locales except for zh_TW which uses IBM-eucTW
<https://www.ibm.com/docs/sr/aix/7.2?topic=sets-euctw>. The other
platform is z/OS and it uses DBCS variants of EBCDIC for wchar_t. All
other platforms use UCS-2/UTF-16/UTF-32 depending on the size of wchar_t.
I've yet to hear of a platform that has file streams that are not
char-based. Even Windows, despite its predilection for wchar_t, uses
(8-bit) char streams for all I/O with the exception of direct writes and
reads to and from the console by applications that have an attached
console. It is straight forward to check if a file handle corresponds to
a console on Windows. std::print() implementations perform such checks
today and, if a handle does correspond to a console, call
WriteConsoleW() directly to bypass the (usually incorrect) console
encoding so that characters are rendered correctly. A std::wprint()
implementation can do likewise.
That leaves just the question of what encoding to transcode to for
non-Windows platforms or for when the output file handle does not
correspond to a console on Windows. The C++ standard recently introduced
the concept of an "environment" encoding as exposed by
std::text_encoding::environment(). Per [text.encoding.members]p14
<https://eel.is/c++draft/text.encoding.members#14>, there is a portable
description of what this encoding corresponds to for POSIX systems. The
standard doesn't say anything specifically for Windows (perhaps it
should), but the environment encoding there would be the Active Code
Page (ACP) encoding. The ACP isn't what is always desired, but it is the
right default as it is most likely the encoding that the next process in
a command pipeline will use to interpret piped output. For applications
that are intended to always and only produce UTF-8, a manifest can be
used to set the ACP to UTF-8 at program startup (see Microsoft
documentation
<https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage>).
Other situations will require the application to explicitly transcode as
needed.
> You would think that the encoding of wchar_t would at least be more
> defined on Windows, surely it must be UTF-16 right? Wrong! It’s not!
> It is not only not consistent from system to system, it is not
> consistent with application running in the same machine, even worse;
> an application could change at runtime what “wchar_t” means when you
> pass it to “WriteConsoleW”, it’s a feature, please see “SetConsoleCP”.
>
SetConsoleCP() doesn't affect the encoding of wchar_t; it affects what
encoding is used by the console to convert char-based console input to
wchar_t. Data in wchar_t -based storage can be assumed to be encoded in
UTF-16 everywhere on Windows (though it cannot be assumed that all such
data is validly encoded; lone surrogates and out of order surrogates are
always a possibility).
>
> Now, why is it named “Write”-“Console” and not just “Write”-“Out”? And
> how is it possible that “WriteConsoleW” is able to pass 16bit per
> character streams when cout is definitely 8bit per character stream?
>
> Because they are actually 2 separate streams bundled in the same
> output interface! (cout)
>
> “WriteConsoleW” only works when the parent process has created a
> special “Console” buffer output (which a console does) which has the
> twin streams, and when you use “WriteConsoleW” it asks to use the
> alternative 16bit stream bundled on the output channel.
>
> If you tried to use a pipe (for example to write out to a file) and
> the application “WriteConsoleW” is used, “WriteConsoleW” will fail
> (because the twin 16bit stream doesn’t exist) and none of those
> messages will be seen.
>
I would quibble with the way this is described above (there aren't two
streams, there is a handle which might be associated with a stream or
might be associated with a console), but this distinction is just not
particularly important.
>
> “WriteConsoleW” is only meant to work if an application is attached to
> a console (or a console like application, that specifically provides
> console like features).
>
> It is not meant for general use applications.
>
> This is a mess. To much operating system specific behavior. You don’t
> want to touch it.
>
I disagree with this characterization. While there is some messiness, it
isn't as bad as has been portrayed. What has historically been the most
troublesome is 1) the continued use of legacy DOS code pages as the
default console encoding (for backward compatibility) and its deviation
from the ACP, and 2) the lack of support for UTF-8 as the ACP; something
that has improved, but unfortunately, is still not on track to become
the default.
Adding std::wprint() is quite doable; most of the hard work was already
done for std::print(). There is no real technical challenge.
I think some programmers would likely benefit from a standardized
std::wprint(). I don't think the numbers are large though, so I wouldn't
consider this a high priority addition relevant to other things WG21
could be working on. But I also don't think this would require a lot of
WG21 time; most of the effort would happen in SG16.
Tom.
> ------------------------------------------------------------------------
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
> behalf of Tymi via Std-Proposals <std-proposals_at_[hidden]>
> *Sent:* Sunday, March 30, 2025 11:30:31 AM
> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
> *Cc:* Tymi <tymi.cpp_at_[hidden]>
> *Subject:* Re: [std-proposals] std::wprint/std::wprintln
>
> I mean, I am up to making std::print just accept std::wformat_ string
> but I am aware many members would not like that...
>
> On Sun, Mar 30, 2025 at 11:19 AM Tymi <tymi.cpp_at_[hidden]> wrote:
>
> wide string support for std::print/std::println would be neat. I
> need it in my projects.
>
> I can write a proposal & possible implementation if enough people
> support this!
>
> tymi.
>
>
Received on 2025-04-01 04:54:07