ISOCPP sg16 List: [isocpp-sg16] Agenda for the 2025-01-21 SG16 meeting

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 22 Jan 2025 00:33:04 -0500

SG16 will hold a meeting *today*, Wednesday, January 22nd, at 19:30 UTC
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20250122T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
That is 11:30am PST, 12:30pm MST, 1:30pm CST, 2:30pm EST, and 20:30 CET.

Sorry for the delayed scheduling. I've almost unburied myself to the
point that I'll be able to devote a reasonable amount of time to SG16 again.

I added this meeting to the shared calendar just a few minutes ago. If
you need a .ics file to import into your calendar, you can download it
here
<https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/EE81D0AF-4216-4365-89BD-181D7A0D6DB2.ics?export>.

The agenda follows.

  * Decide on a meeting schedule for before/after the Hagenberg meeting.
  * P2019R7: Thread attributes <https://wg21.link/p2019>

I normally schedule SG16 meetings for the 2nd and 4th Wednesdays of each
month. The Hagenberg meeting is the 2nd week of February (the 10th
through the 15th). I'm not planning to hold an in-person SG16 meeting
during the Hagenberg meeting because it is too difficult to get quorum.
I'd like to get one more meeting scheduled before Hagenberg, so either
January 29th or February 5th. I have a slight preference for February
5th and will schedule for that date unless there are requests for the
January 29th date instead. Following Hagenberg, we can resume our normal
meeting cadence starting on February 26th. If you have thoughts on this,
please share.

We last discussed P2019 during the 2024-09-25 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/tree/master#september-25th-2024>
where we had polls that demonstrated consensus for two different options
for the encoding of thread names, but with neither option being a clear
winner. I still haven't published proper minutes for that meeting, but
see the summary posted in the github tracker
<https://github.com/cplusplus/papers/issues/817#issuecomment-2482513214>.
In close calls like this, we usually defer to the author. Corentin and
Victor discussed offline and reached an agreement on use of the ordinary
literal encoding. Corentin will present and we'll hopefully find
agreement on a solution and forward the paper.

I intentionally kept the agenda rather light for this meeting given the
late notice and anticipation that there will still be plenty to discuss
regarding literal encodings, NTBS, NTMBS, and C vs C++ vs
system/environment locales.

My rough notes from the 2025-09-25 discussion of P2019R7 are below for
reference.

    - P2019R7: Thread attributes:
       - Corentin introduced the paper.
         - The thread name is provided to the OS so that it is available
    for display in an OS monitor, debugger, process thread list, etc...
         - On POSIX systems, the name is stuffed into a small string
    buffer provided by the pthreads library and is generally interpreted
    as execution encoding.
         - Prior to Windows 10, there were unofficial ways to associate
    a name with a thread.
         - On Windows 10 there is a public interface.
         - On Windows, the name must be provided in wchar_t; there is no
    "ANSI" version of the interface.
         - A previous revision of this paper supported both char and
    wchar_t.
         - A copy of the string will always be needed, so transcoding
    costs aren't significant.
         - We should support char8_t where we want to support Unicode,
    but that isn't proposed in this paper; can add that later.
         - The name is interpreted as an NTBS in the execution encoding.
         - The name can be transcoded to wchar_t on Windows.
         - If mojibake happens, it doesn't affect users of the software.
       - Victor: Execution encoding isn't defined in the standard; we
    have execution character set. Is this locale encoding?
       - Victor: I think we should use a well-defined encoding like the
    literal encoding.
       - Corentin: We need a solution that works with
    MultiByteToWideChar() on Windows; that only works with execution
    encoding.
       - Corentin: We don't require a relationship between literal
    encoding and execution encoding; that technically means that
    conversions don't work.
       - Victor: This should be implementation detail.
       - Tom: I'm torn, I agree with Victor, but I also think the
    proposal reflects existing implementations.
       - Tom: On POSIX systems, the string is likely to be interpreted
    by other tools using the execution encoding.
       - Tom: Should that NTBS be NTMBS?
       - Corentin: Probably.
       - Corentin: The proposal is consistent with behavior elsewhere in
    the standard library.
       - Tom: Is the name exposed in the thread class interface?
       - Corentin: No; not all platforms expose it; we would have to
    store it internally.
       - Corentin: On some platforms, retrieving thread properties
    requires a thread ID.
       - Steve: If the name is not in the right encoding, you'll get
    broken behavior in a predictable way.
       - Victor: On POSIX, I think we should do what path does and
    prohibit transcoding.
       - Victor: On Windows, this is just broken because it uses C locale.
       - Tom: Does NTBS imply C locale to you?
       - Victor: Yes.
       - Jens: Execution encoding is not a term in the C++ standard; we
    have in [character.seq.general] "the encodings of the execution
    character sets ... are locale specific"; we should use this wording.
       - Corentin: We probably should define "execution encoding".
       - Jens: Not in this paper.
       - Jens: We should specify which locale we mean here.
       - Corentin: No, not in each place where we refer to the execution
    encoding.
       - Jens: In the recent exception class discussions, we determined
    that the encodings correspond to the C locale; we should be more
    specific here.
       - Corentin: If we just say NTMBS, then we get the right result.
       - Tom: And that gets us C locale.
       - Jens: For exception classes we say NTBS with a carve out for
    NTMBS; do we want that here?
       - Tom: If we could, I think we would prefer to require NTMBS for
    the exception classes.
       - Jens: <referring to exception class wording>; wording directs
    to codecvt and thus C++ locale.
       - Jens: So, do we want C or C++ locale here? It looks like NTMBS
    comes in two flavors; we should be clear on the semantics.
       - Jens: <referring to a link provided by Victor> regarding
    SetThreadDescription(); there is a technique that involes use of a
    char string.
       - Corentin: That only works when a debugger is attached;
    implementors won't use that technique.
       - Jens: We need to decide what exactly we want this interface to
    be compatible with.
       - Jens: The level of complexity here is much less than for paths.
       - Corentin: Can the C and C++ locales diverge?
       - Jens: Yes.
       - Eddie: This is similar to path; why not have the name_hint
    constructor behave like path where we can provide the string in
    multiple encodings.
       - Corentin: Path is unique since filesystem encodings may differ
    from the other encodings; it is more complicated.
       - Corentin: We should avoid exposing programmers to encoding
    concerns where we don't need to.
       - Corentin: I removed char8_t support because I thought it would
    increase consensus; if SG16 wants to re-introduce char8_t or require
    literal encoding, I'm ok with that.
       - Eddie: What I meant is that we could use the native()
    implementation-defined character type.
       - Corentin: I want to be able to pass an ordinary string literal
    and have it work everywhere.
       - Victor: I agree that we don't want most of the complexity of
    path. However, path is a good abstract model of what we want here.
       - Victor: On POSIX, the bytes should just be passed as is; a
    binary identifier could be used if desired.
       - Victor: NTMBS means you can't use std::format to produce the
    thread name.
       - Corentin: Would it increase consensus to require the ordinary
    literal encoding for C++26?
       - Victor: Yes.
       - Jens: What does sprintf do?
       - Tom: It uses the execution encoding so that special characters
    in trailing code units are not misinterpreted.
       - Jens: We don't require the literal encoding to match the
    execution encoding though that might be a design bug.
       - Jens: Though format is different, I don't think we should
    sprint a new encoding requirement on programmers here.
       - Jens: We could specify that, for POSIX, an NTBS and prohibit
    conversions, add wchar_t for Windows, and then have portability issues.
       - Corentin: I think printf() is the model to follow.
       - Jens: Then make it an NTMBS and reference the C locale.
       - Jens: The only conversion concern we have is which conversion
    function is to be called on Windows.
       - Corentin: The separation of the C and C++ locales is new
    information to me.
       - Tom: We could restrict the characters used to the basic literal
    character set.
       - Victor: I agree with the use of NTBS on POSIX, but I think
    NTMBS is wrong for Windows as it is incompatible with everything.
       - Poll 2: P2019R7: Name hint should be provided in the ordinary
    literal encoding.
         - Attendees: 8
         - SF F N A SA
            3 1 3 1 0
         - Weak consensus.
       - Poll 3: P2019R7: Name hint should be provided as an NTMBS in
    the C locale encoding.
         - Attendees: 8
         - SF F N A SA
            1 5 1 0 1
         - Consensus.
       - Victor: I don't think we're done with this paper; use of
    string_view might be problematic.
       - Jens: It is called name "hint" for a reason.
       - Jens: If there are embedded nulls or gets truncated, tough luck.

Tom.

Received on 2025-01-22 05:33:07