C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Agenda for the 2024-10-09 SG16 meeting

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 23 Oct 2024 21:14:45 +0200
On Wed, Oct 23, 2024, 20:43 Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

>
>
> On 23/10/2024 20.05, Victor Zverovich via SG16 wrote:
> > This gives mojibake on Windows with Belarusian localization.
>
> Could you please complete the picture here?
> You said your example uses UTF-8 literal encoding.
> What's the C locale encoding for the execution character set
> under the Windows Belarusian localization?
> What's the expected input encoding for "mbstowcs" in that localization?
> What's the generated output encoding for "mbstowcs" in that localization?
>
> (We already know that a literal encoding that is incompatible
> with the locale's encoding is hard to program for.)
>
> > Neither pthread_setname_np nor pthread_getname_np assume the C locale
> encoding. For example, quoting
> https://docs.oracle.com/cd/E88353_01/html/E37843/pthread-setname-np-3c.html
> <
> https://docs.oracle.com/cd/E88353_01/html/E37843/pthread-setname-np-3c.html
> >:
> >
> > The thread name is a string of length 31 bytes or less, UTF-8
> encoded.
>
> Note that pthread_setname_np is not in POSIX. What you quoted is the way
> this function operates on Solaris.
>
> In contrast, my Linux man page says:
>
> The thread name is a meaningful C language string,
> whose length is restricted to 16 characters, including the
> terminating null byte ('\0').
>
> (No, I don't know what "meaningful" means.)
>
> > This means that the POSIX implementation in P2019R7 is actually
> incorrect and doesn’t match the wording.
>
> The paper says "on most POSIX implementation". Apparently, Solaris is
> different here.
> Are there any non-UTF environments on Solaris these days?
> Would Solaris transcode from UTF-8 to the encoding of that other
> environment?
> I doubt it. Thread names are just few bytes in a special memory area where
> the usual tools can find them; I can't imagine any Unix doing any sort of
> encoding recognition/translation when setting a name.
>
> Do you have a more complete survey of Unix-like operating systems?
>
> > SetThreadDescription always uses UTF-16 on Windows and there is no C
> locale or ACP version.
>
> What is "ACP version"?
>
> We already know we have to transcode for Windows (because wchar_t),
> but all the standard tools such as mbstowcs use the encoding specified by
> the C locale (input and output), so they're unsuitable to reliably output
> UTF-16.
>
> Maybe we want our set_thread_name to simply take char8_t (in UTF-8
> encoding)
> and be done with it. And no, I'm not worried about copying / transcoding
> 16-32 bytes even on Unix.
>

if that increases consensus, sure.

>
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-10-23 19:15:03