Date: Sun, 24 Aug 2025 16:31:46 +0200
> On Aug 24, 2025, at 4:21 AM, Thiago Macieira via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Saturday, 23 August 2025 19:55:14 Central Daylight Time Tom Honermann
> wrote:
>
>> * char is assumed to be UTF-8. This is very common on Linux and macOS.
>> Support for this has improved in the Windows ecosystem, but rough
>> corners remain. Note that compiling with MSVC's /utf-8 option is not
>> sufficient by itself to enable an assumption of UTF-8 for text held
>> in char-based storage; the environment encoding, console encoding,
>> and locale region settings also have to be considered.
>
> Indeed and Microsoft's slow uptake on this is annoying, even if completely
> understandable due to the massive legacy it is dealing with.
>>
The real shame is that Microsoft has everything in place already. With a simple call to setlocale(LC_ALL, “.utf8”); all Windows functions support UTF-8 (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#:~:text=To%20enable%20UTF%2D8%20mode,8%20for%20the%20code%20page.). For console output we have to set the appropriate code page in addition to that. One annoying thing is that we need to use the Windows functions to get the command line arguments as UTF-16 and then convert them to UTF-8. I work this way to make my code portable: UTF-8 everywhere. However, as you have mentioned, right now for this I still use char and not char8_t. As all Windows functions already work with the .utf8 locale, it would be easy to provide support for char8_t as well. (Maybe they need to change functions under the hood to take the locale explicitly instead of relying on the global locale.)
I even have the code page globally set to UTF-8 on Windows.
>
> On Saturday, 23 August 2025 19:55:14 Central Daylight Time Tom Honermann
> wrote:
>
>> * char is assumed to be UTF-8. This is very common on Linux and macOS.
>> Support for this has improved in the Windows ecosystem, but rough
>> corners remain. Note that compiling with MSVC's /utf-8 option is not
>> sufficient by itself to enable an assumption of UTF-8 for text held
>> in char-based storage; the environment encoding, console encoding,
>> and locale region settings also have to be considered.
>
> Indeed and Microsoft's slow uptake on this is annoying, even if completely
> understandable due to the massive legacy it is dealing with.
>>
The real shame is that Microsoft has everything in place already. With a simple call to setlocale(LC_ALL, “.utf8”); all Windows functions support UTF-8 (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#:~:text=To%20enable%20UTF%2D8%20mode,8%20for%20the%20code%20page.). For console output we have to set the appropriate code page in addition to that. One annoying thing is that we need to use the Windows functions to get the command line arguments as UTF-16 and then convert them to UTF-8. I work this way to make my code portable: UTF-8 everywhere. However, as you have mentioned, right now for this I still use char and not char8_t. As all Windows functions already work with the .utf8 locale, it would be easy to provide support for char8_t as well. (Maybe they need to change functions under the hood to take the locale explicitly instead of relying on the global locale.)
I even have the code page globally set to UTF-8 on Windows.
Received on 2025-08-24 14:32:00