ISOCPP sg16 List: Re: [isocpp-sg16] Agenda for the 2025-01-21 SG16 meeting

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 22 Jan 2025 13:28:12 -0500

Reminder that this meeting is happening in one hour.

Tom.

On 1/22/25 12:33 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a meeting *today*, Wednesday, January 22nd, at 19:30
> UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20250122T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
> That is 11:30am PST, 12:30pm MST, 1:30pm CST, 2:30pm EST, and 20:30 CET.
>
> Sorry for the delayed scheduling. I've almost unburied myself to the
> point that I'll be able to devote a reasonable amount of time to SG16
> again.
>
> I added this meeting to the shared calendar just a few minutes ago. If
> you need a .ics file to import into your calendar, you can download it
> here
> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/EE81D0AF-4216-4365-89BD-181D7A0D6DB2.ics?export>.
>
> The agenda follows.
>
> * Decide on a meeting schedule for before/after the Hagenberg meeting.
> * P2019R7: Thread attributes <https://wg21.link/p2019>
>
> I normally schedule SG16 meetings for the 2nd and 4th Wednesdays of
> each month. The Hagenberg meeting is the 2nd week of February (the
> 10th through the 15th). I'm not planning to hold an in-person SG16
> meeting during the Hagenberg meeting because it is too difficult to
> get quorum. I'd like to get one more meeting scheduled before
> Hagenberg, so either January 29th or February 5th. I have a slight
> preference for February 5th and will schedule for that date unless
> there are requests for the January 29th date instead. Following
> Hagenberg, we can resume our normal meeting cadence starting on
> February 26th. If you have thoughts on this, please share.
>
> We last discussed P2019 during the 2024-09-25 SG16 meeting
> <https://github.com/sg16-unicode/sg16-meetings/tree/master#september-25th-2024>
> where we had polls that demonstrated consensus for two different
> options for the encoding of thread names, but with neither option
> being a clear winner. I still haven't published proper minutes for
> that meeting, but see the summary posted in the github tracker
> <https://github.com/cplusplus/papers/issues/817#issuecomment-2482513214>.
> In close calls like this, we usually defer to the author. Corentin and
> Victor discussed offline and reached an agreement on use of the
> ordinary literal encoding. Corentin will present and we'll hopefully
> find agreement on a solution and forward the paper.
>
> I intentionally kept the agenda rather light for this meeting given
> the late notice and anticipation that there will still be plenty to
> discuss regarding literal encodings, NTBS, NTMBS, and C vs C++ vs
> system/environment locales.
>
> My rough notes from the 2025-09-25 discussion of P2019R7 are below for
> reference.
>
> - P2019R7: Thread attributes:
> - Corentin introduced the paper.
> - The thread name is provided to the OS so that it is
> available for display in an OS monitor, debugger, process thread
> list, etc...
> - On POSIX systems, the name is stuffed into a small string
> buffer provided by the pthreads library and is generally
> interpreted as execution encoding.
> - Prior to Windows 10, there were unofficial ways to associate
> a name with a thread.
> - On Windows 10 there is a public interface.
> - On Windows, the name must be provided in wchar_t; there is
> no "ANSI" version of the interface.
> - A previous revision of this paper supported both char and
> wchar_t.
> - A copy of the string will always be needed, so transcoding
> costs aren't significant.
> - We should support char8_t where we want to support Unicode,
> but that isn't proposed in this paper; can add that later.
> - The name is interpreted as an NTBS in the execution encoding.
> - The name can be transcoded to wchar_t on Windows.
> - If mojibake happens, it doesn't affect users of the software.
> - Victor: Execution encoding isn't defined in the standard; we
> have execution character set. Is this locale encoding?
> - Victor: I think we should use a well-defined encoding like the
> literal encoding.
> - Corentin: We need a solution that works with
> MultiByteToWideChar() on Windows; that only works with execution
> encoding.
> - Corentin: We don't require a relationship between literal
> encoding and execution encoding; that technically means that
> conversions don't work.
> - Victor: This should be implementation detail.
> - Tom: I'm torn, I agree with Victor, but I also think the
> proposal reflects existing implementations.
> - Tom: On POSIX systems, the string is likely to be interpreted
> by other tools using the execution encoding.
> - Tom: Should that NTBS be NTMBS?
> - Corentin: Probably.
> - Corentin: The proposal is consistent with behavior elsewhere
> in the standard library.
> - Tom: Is the name exposed in the thread class interface?
> - Corentin: No; not all platforms expose it; we would have to
> store it internally.
> - Corentin: On some platforms, retrieving thread properties
> requires a thread ID.
> - Steve: If the name is not in the right encoding, you'll get
> broken behavior in a predictable way.
> - Victor: On POSIX, I think we should do what path does and
> prohibit transcoding.
> - Victor: On Windows, this is just broken because it uses C locale.
> - Tom: Does NTBS imply C locale to you?
> - Victor: Yes.
> - Jens: Execution encoding is not a term in the C++ standard; we
> have in [character.seq.general] "the encodings of the execution
> character sets ... are locale specific"; we should use this wording.
> - Corentin: We probably should define "execution encoding".
> - Jens: Not in this paper.
> - Jens: We should specify which locale we mean here.
> - Corentin: No, not in each place where we refer to the
> execution encoding.
> - Jens: In the recent exception class discussions, we determined
> that the encodings correspond to the C locale; we should be more
> specific here.
> - Corentin: If we just say NTMBS, then we get the right result.
> - Tom: And that gets us C locale.
> - Jens: For exception classes we say NTBS with a carve out for
> NTMBS; do we want that here?
> - Tom: If we could, I think we would prefer to require NTMBS for
> the exception classes.
> - Jens: <referring to exception class wording>; wording directs
> to codecvt and thus C++ locale.
> - Jens: So, do we want C or C++ locale here? It looks like NTMBS
> comes in two flavors; we should be clear on the semantics.
> - Jens: <referring to a link provided by Victor> regarding
> SetThreadDescription(); there is a technique that involes use of a
> char string.
> - Corentin: That only works when a debugger is attached;
> implementors won't use that technique.
> - Jens: We need to decide what exactly we want this interface to
> be compatible with.
> - Jens: The level of complexity here is much less than for paths.
> - Corentin: Can the C and C++ locales diverge?
> - Jens: Yes.
> - Eddie: This is similar to path; why not have the name_hint
> constructor behave like path where we can provide the string in
> multiple encodings.
> - Corentin: Path is unique since filesystem encodings may differ
> from the other encodings; it is more complicated.
> - Corentin: We should avoid exposing programmers to encoding
> concerns where we don't need to.
> - Corentin: I removed char8_t support because I thought it would
> increase consensus; if SG16 wants to re-introduce char8_t or
> require literal encoding, I'm ok with that.
> - Eddie: What I meant is that we could use the native()
> implementation-defined character type.
> - Corentin: I want to be able to pass an ordinary string literal
> and have it work everywhere.
> - Victor: I agree that we don't want most of the complexity of
> path. However, path is a good abstract model of what we want here.
> - Victor: On POSIX, the bytes should just be passed as is; a
> binary identifier could be used if desired.
> - Victor: NTMBS means you can't use std::format to produce the
> thread name.
> - Corentin: Would it increase consensus to require the ordinary
> literal encoding for C++26?
> - Victor: Yes.
> - Jens: What does sprintf do?
> - Tom: It uses the execution encoding so that special characters
> in trailing code units are not misinterpreted.
> - Jens: We don't require the literal encoding to match the
> execution encoding though that might be a design bug.
> - Jens: Though format is different, I don't think we should
> sprint a new encoding requirement on programmers here.
> - Jens: We could specify that, for POSIX, an NTBS and prohibit
> conversions, add wchar_t for Windows, and then have portability
> issues.
> - Corentin: I think printf() is the model to follow.
> - Jens: Then make it an NTMBS and reference the C locale.
> - Jens: The only conversion concern we have is which conversion
> function is to be called on Windows.
> - Corentin: The separation of the C and C++ locales is new
> information to me.
> - Tom: We could restrict the characters used to the basic
> literal character set.
> - Victor: I agree with the use of NTBS on POSIX, but I think
> NTMBS is wrong for Windows as it is incompatible with everything.
> - Poll 2: P2019R7: Name hint should be provided in the ordinary
> literal encoding.
> - Attendees: 8
> - SF F N A SA
> 3 1 3 1 0
> - Weak consensus.
> - Poll 3: P2019R7: Name hint should be provided as an NTMBS in
> the C locale encoding.
> - Attendees: 8
> - SF F N A SA
> 1 5 1 0 1
> - Consensus.
> - Victor: I don't think we're done with this paper; use of
> string_view might be problematic.
> - Jens: It is called name "hint" for a reason.
> - Jens: If there are embedded nulls or gets truncated, tough luck.
>
> Tom.
>
>

Received on 2025-01-22 18:28:14