Date: Tue, 20 Feb 2024 11:04:17 +0100
Hi Tiago,
On Tue, Feb 20, 2024 at 10:28 AM Tiago Freire <tmiguelf_at_[hidden]> wrote:
>
>
> > That LWG4044 reads like somebody wrote it thinking of Windows first and
> adding the rest as an afterthought. I'd agree with Jonathan on the
> resolution but would like to adjust the approach:
>
> > - For platforms that have separate methods for outputting Unicode
> and non-Unicode text, it should determine if the output is to a Unicode
> terminal and use the appropriate API, flushing the other API if necessary.
>
> > - For platforms that have a single Unicode-compatible output, just
> use the output.
>
> > but in legalese. Splitting platforms on whether or not they are Windows
> (except in a non-Windows way) first, and only then adding complexity
> required for those platforms, seems like the best way to help implementers
> avoid the complexity if it's not necessary. As with Corentin's email (that
> just came in), help platforms other than Windows avoid all complexity, and
> give Windows the space to do its runtime debugging hooks and required
> conversions for Unicode so it will work properly.
>
>
>
> Actually, the problem is much much worse, this is a major defect.
>
> The way it is written seems to imply that one of them works with Unicode
> and the other doesn’t. When actually both of them or neither of them can do
> that.
>
>
>
> You can change the way code points are interpreted on your windows
> terminal.
>
>
>
> You “can” use the command chcp 65001 to change the console to interpret
> the stream as UTF-8.
>
> I mean you “used to”, until a couple of months when Windows rolled-out an
> update, where previously the console codepoint was inherited by new running
> application,
>
> and the update just broke that by always reverting to the default codepage
> when a new application is created, effectively breaking all apps that used
> to work relying on this feature (why Windows? Why?).
>
> So, depending on the version you are using you may or may not be able to
> see this.
>
> But if you have the update, you can still change the code page for your
> application using the function “SetConsoleOutputCP(65001);”.
>
> Or you can set “Use Unicode UTF-8 for worldwide language support” setting
> hidden deep in your Regional Language settings of your OS.
>
>
>
> WriteConsoleW doesn’t write to a Unicode stream, nor does it write to the
> same stream converting the input to Unicode, it writes to “Console Stream
> Buffer” if one is available (you can create one yourself
> https://learn.microsoft.com/en-us/windows/console/createconsolescreenbuffer).
> One is not necessarily always available (if you are using that API you
> better be testing for that).
>
>
>
> If I’m creating a my own custom console application I am decide not to
> provide you one, and if I decide to provide you with one, there’s no
> guarantee that the way the buffer is interpreted is UTF16, it is totally
> legal for me to interpret it as a different 2byte encoding system. The API
> doesn’t care, and doesn’t validate that your “UTF16” data stream is
> actually valid UTF16 (the file system has the exact same problem).
>
>
>
> The point is, encoding of the stream is only a thing at the very last
> moment when your console application decides to print your sequence of
> bytes/doublets to the screen, and never before.
>
> And there’s no guarantee anywhere (as far as C++ can control) that either
> of them is Unicode, or is not Unicode.
>
>
>
> This was a major pain point on an application I was working on. Current
> references to Unicode in the standard right now, are just flat-out wrong!
> Things don’t actually work this way.
>
Again, I'm happy to defer to Windows experts on this subject; the issue as
filed refers to the handling as described being overly complex for
non-Windows systems. On that part I agree, and my solution was to split
them into the "simple" systems on one hand, and Windows on the other.
If you have input that would allow the standard to enable developers to
output Unicode cleanly on Windows, I'd be very happy to add anything that's
required for it to the standard. But we should keep in mind that we only
want to specify behavior that is not specific to a single operating system
- so if there is a lot of work to be done on Windows to make this work
reliably, that's an implementation QoI for Windows platforms, and we should
only specify what's required to get the platform integrator to do that work.
What do you suggest for the Windows side of this?
On Tue, Feb 20, 2024 at 10:28 AM Tiago Freire <tmiguelf_at_[hidden]> wrote:
>
>
> > That LWG4044 reads like somebody wrote it thinking of Windows first and
> adding the rest as an afterthought. I'd agree with Jonathan on the
> resolution but would like to adjust the approach:
>
> > - For platforms that have separate methods for outputting Unicode
> and non-Unicode text, it should determine if the output is to a Unicode
> terminal and use the appropriate API, flushing the other API if necessary.
>
> > - For platforms that have a single Unicode-compatible output, just
> use the output.
>
> > but in legalese. Splitting platforms on whether or not they are Windows
> (except in a non-Windows way) first, and only then adding complexity
> required for those platforms, seems like the best way to help implementers
> avoid the complexity if it's not necessary. As with Corentin's email (that
> just came in), help platforms other than Windows avoid all complexity, and
> give Windows the space to do its runtime debugging hooks and required
> conversions for Unicode so it will work properly.
>
>
>
> Actually, the problem is much much worse, this is a major defect.
>
> The way it is written seems to imply that one of them works with Unicode
> and the other doesn’t. When actually both of them or neither of them can do
> that.
>
>
>
> You can change the way code points are interpreted on your windows
> terminal.
>
>
>
> You “can” use the command chcp 65001 to change the console to interpret
> the stream as UTF-8.
>
> I mean you “used to”, until a couple of months when Windows rolled-out an
> update, where previously the console codepoint was inherited by new running
> application,
>
> and the update just broke that by always reverting to the default codepage
> when a new application is created, effectively breaking all apps that used
> to work relying on this feature (why Windows? Why?).
>
> So, depending on the version you are using you may or may not be able to
> see this.
>
> But if you have the update, you can still change the code page for your
> application using the function “SetConsoleOutputCP(65001);”.
>
> Or you can set “Use Unicode UTF-8 for worldwide language support” setting
> hidden deep in your Regional Language settings of your OS.
>
>
>
> WriteConsoleW doesn’t write to a Unicode stream, nor does it write to the
> same stream converting the input to Unicode, it writes to “Console Stream
> Buffer” if one is available (you can create one yourself
> https://learn.microsoft.com/en-us/windows/console/createconsolescreenbuffer).
> One is not necessarily always available (if you are using that API you
> better be testing for that).
>
>
>
> If I’m creating a my own custom console application I am decide not to
> provide you one, and if I decide to provide you with one, there’s no
> guarantee that the way the buffer is interpreted is UTF16, it is totally
> legal for me to interpret it as a different 2byte encoding system. The API
> doesn’t care, and doesn’t validate that your “UTF16” data stream is
> actually valid UTF16 (the file system has the exact same problem).
>
>
>
> The point is, encoding of the stream is only a thing at the very last
> moment when your console application decides to print your sequence of
> bytes/doublets to the screen, and never before.
>
> And there’s no guarantee anywhere (as far as C++ can control) that either
> of them is Unicode, or is not Unicode.
>
>
>
> This was a major pain point on an application I was working on. Current
> references to Unicode in the standard right now, are just flat-out wrong!
> Things don’t actually work this way.
>
Again, I'm happy to defer to Windows experts on this subject; the issue as
filed refers to the handling as described being overly complex for
non-Windows systems. On that part I agree, and my solution was to split
them into the "simple" systems on one hand, and Windows on the other.
If you have input that would allow the standard to enable developers to
output Unicode cleanly on Windows, I'd be very happy to add anything that's
required for it to the standard. But we should keep in mind that we only
want to specify behavior that is not specific to a single operating system
- so if there is a lot of work to be done on Windows to make this work
reliably, that's an implementation QoI for Windows platforms, and we should
only specify what's required to get the platform integrator to do that work.
What do you suggest for the Windows side of this?
Received on 2024-02-20 10:04:29