Date: Wed, 22 Jan 2025 00:33:04 -0500
SG16 will hold a meeting *today*, Wednesday, January 22nd, at 19:30 UTC
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20250122T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
That is 11:30am PST, 12:30pm MST, 1:30pm CST, 2:30pm EST, and 20:30 CET.
Sorry for the delayed scheduling. I've almost unburied myself to the
point that I'll be able to devote a reasonable amount of time to SG16 again.
I added this meeting to the shared calendar just a few minutes ago. If
you need a .ics file to import into your calendar, you can download it
here
<https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/EE81D0AF-4216-4365-89BD-181D7A0D6DB2.ics?export>.
The agenda follows.
* Decide on a meeting schedule for before/after the Hagenberg meeting.
* P2019R7: Thread attributes <https://wg21.link/p2019>
I normally schedule SG16 meetings for the 2nd and 4th Wednesdays of each
month. The Hagenberg meeting is the 2nd week of February (the 10th
through the 15th). I'm not planning to hold an in-person SG16 meeting
during the Hagenberg meeting because it is too difficult to get quorum.
I'd like to get one more meeting scheduled before Hagenberg, so either
January 29th or February 5th. I have a slight preference for February
5th and will schedule for that date unless there are requests for the
January 29th date instead. Following Hagenberg, we can resume our normal
meeting cadence starting on February 26th. If you have thoughts on this,
please share.
We last discussed P2019 during the 2024-09-25 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/tree/master#september-25th-2024>
where we had polls that demonstrated consensus for two different options
for the encoding of thread names, but with neither option being a clear
winner. I still haven't published proper minutes for that meeting, but
see the summary posted in the github tracker
<https://github.com/cplusplus/papers/issues/817#issuecomment-2482513214>.
In close calls like this, we usually defer to the author. Corentin and
Victor discussed offline and reached an agreement on use of the ordinary
literal encoding. Corentin will present and we'll hopefully find
agreement on a solution and forward the paper.
I intentionally kept the agenda rather light for this meeting given the
late notice and anticipation that there will still be plenty to discuss
regarding literal encodings, NTBS, NTMBS, and C vs C++ vs
system/environment locales.
My rough notes from the 2025-09-25 discussion of P2019R7 are below for
reference.
- P2019R7: Thread attributes:
- Corentin introduced the paper.
- The thread name is provided to the OS so that it is available
for display in an OS monitor, debugger, process thread list, etc...
- On POSIX systems, the name is stuffed into a small string
buffer provided by the pthreads library and is generally interpreted
as execution encoding.
- Prior to Windows 10, there were unofficial ways to associate
a name with a thread.
- On Windows 10 there is a public interface.
- On Windows, the name must be provided in wchar_t; there is no
"ANSI" version of the interface.
- A previous revision of this paper supported both char and
wchar_t.
- A copy of the string will always be needed, so transcoding
costs aren't significant.
- We should support char8_t where we want to support Unicode,
but that isn't proposed in this paper; can add that later.
- The name is interpreted as an NTBS in the execution encoding.
- The name can be transcoded to wchar_t on Windows.
- If mojibake happens, it doesn't affect users of the software.
- Victor: Execution encoding isn't defined in the standard; we
have execution character set. Is this locale encoding?
- Victor: I think we should use a well-defined encoding like the
literal encoding.
- Corentin: We need a solution that works with
MultiByteToWideChar() on Windows; that only works with execution
encoding.
- Corentin: We don't require a relationship between literal
encoding and execution encoding; that technically means that
conversions don't work.
- Victor: This should be implementation detail.
- Tom: I'm torn, I agree with Victor, but I also think the
proposal reflects existing implementations.
- Tom: On POSIX systems, the string is likely to be interpreted
by other tools using the execution encoding.
- Tom: Should that NTBS be NTMBS?
- Corentin: Probably.
- Corentin: The proposal is consistent with behavior elsewhere in
the standard library.
- Tom: Is the name exposed in the thread class interface?
- Corentin: No; not all platforms expose it; we would have to
store it internally.
- Corentin: On some platforms, retrieving thread properties
requires a thread ID.
- Steve: If the name is not in the right encoding, you'll get
broken behavior in a predictable way.
- Victor: On POSIX, I think we should do what path does and
prohibit transcoding.
- Victor: On Windows, this is just broken because it uses C locale.
- Tom: Does NTBS imply C locale to you?
- Victor: Yes.
- Jens: Execution encoding is not a term in the C++ standard; we
have in [character.seq.general] "the encodings of the execution
character sets ... are locale specific"; we should use this wording.
- Corentin: We probably should define "execution encoding".
- Jens: Not in this paper.
- Jens: We should specify which locale we mean here.
- Corentin: No, not in each place where we refer to the execution
encoding.
- Jens: In the recent exception class discussions, we determined
that the encodings correspond to the C locale; we should be more
specific here.
- Corentin: If we just say NTMBS, then we get the right result.
- Tom: And that gets us C locale.
- Jens: For exception classes we say NTBS with a carve out for
NTMBS; do we want that here?
- Tom: If we could, I think we would prefer to require NTMBS for
the exception classes.
- Jens: <referring to exception class wording>; wording directs
to codecvt and thus C++ locale.
- Jens: So, do we want C or C++ locale here? It looks like NTMBS
comes in two flavors; we should be clear on the semantics.
- Jens: <referring to a link provided by Victor> regarding
SetThreadDescription(); there is a technique that involes use of a
char string.
- Corentin: That only works when a debugger is attached;
implementors won't use that technique.
- Jens: We need to decide what exactly we want this interface to
be compatible with.
- Jens: The level of complexity here is much less than for paths.
- Corentin: Can the C and C++ locales diverge?
- Jens: Yes.
- Eddie: This is similar to path; why not have the name_hint
constructor behave like path where we can provide the string in
multiple encodings.
- Corentin: Path is unique since filesystem encodings may differ
from the other encodings; it is more complicated.
- Corentin: We should avoid exposing programmers to encoding
concerns where we don't need to.
- Corentin: I removed char8_t support because I thought it would
increase consensus; if SG16 wants to re-introduce char8_t or require
literal encoding, I'm ok with that.
- Eddie: What I meant is that we could use the native()
implementation-defined character type.
- Corentin: I want to be able to pass an ordinary string literal
and have it work everywhere.
- Victor: I agree that we don't want most of the complexity of
path. However, path is a good abstract model of what we want here.
- Victor: On POSIX, the bytes should just be passed as is; a
binary identifier could be used if desired.
- Victor: NTMBS means you can't use std::format to produce the
thread name.
- Corentin: Would it increase consensus to require the ordinary
literal encoding for C++26?
- Victor: Yes.
- Jens: What does sprintf do?
- Tom: It uses the execution encoding so that special characters
in trailing code units are not misinterpreted.
- Jens: We don't require the literal encoding to match the
execution encoding though that might be a design bug.
- Jens: Though format is different, I don't think we should
sprint a new encoding requirement on programmers here.
- Jens: We could specify that, for POSIX, an NTBS and prohibit
conversions, add wchar_t for Windows, and then have portability issues.
- Corentin: I think printf() is the model to follow.
- Jens: Then make it an NTMBS and reference the C locale.
- Jens: The only conversion concern we have is which conversion
function is to be called on Windows.
- Corentin: The separation of the C and C++ locales is new
information to me.
- Tom: We could restrict the characters used to the basic literal
character set.
- Victor: I agree with the use of NTBS on POSIX, but I think
NTMBS is wrong for Windows as it is incompatible with everything.
- Poll 2: P2019R7: Name hint should be provided in the ordinary
literal encoding.
- Attendees: 8
- SF F N A SA
3 1 3 1 0
- Weak consensus.
- Poll 3: P2019R7: Name hint should be provided as an NTMBS in
the C locale encoding.
- Attendees: 8
- SF F N A SA
1 5 1 0 1
- Consensus.
- Victor: I don't think we're done with this paper; use of
string_view might be problematic.
- Jens: It is called name "hint" for a reason.
- Jens: If there are embedded nulls or gets truncated, tough luck.
Tom.
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20250122T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
That is 11:30am PST, 12:30pm MST, 1:30pm CST, 2:30pm EST, and 20:30 CET.
Sorry for the delayed scheduling. I've almost unburied myself to the
point that I'll be able to devote a reasonable amount of time to SG16 again.
I added this meeting to the shared calendar just a few minutes ago. If
you need a .ics file to import into your calendar, you can download it
here
<https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/EE81D0AF-4216-4365-89BD-181D7A0D6DB2.ics?export>.
The agenda follows.
* Decide on a meeting schedule for before/after the Hagenberg meeting.
* P2019R7: Thread attributes <https://wg21.link/p2019>
I normally schedule SG16 meetings for the 2nd and 4th Wednesdays of each
month. The Hagenberg meeting is the 2nd week of February (the 10th
through the 15th). I'm not planning to hold an in-person SG16 meeting
during the Hagenberg meeting because it is too difficult to get quorum.
I'd like to get one more meeting scheduled before Hagenberg, so either
January 29th or February 5th. I have a slight preference for February
5th and will schedule for that date unless there are requests for the
January 29th date instead. Following Hagenberg, we can resume our normal
meeting cadence starting on February 26th. If you have thoughts on this,
please share.
We last discussed P2019 during the 2024-09-25 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/tree/master#september-25th-2024>
where we had polls that demonstrated consensus for two different options
for the encoding of thread names, but with neither option being a clear
winner. I still haven't published proper minutes for that meeting, but
see the summary posted in the github tracker
<https://github.com/cplusplus/papers/issues/817#issuecomment-2482513214>.
In close calls like this, we usually defer to the author. Corentin and
Victor discussed offline and reached an agreement on use of the ordinary
literal encoding. Corentin will present and we'll hopefully find
agreement on a solution and forward the paper.
I intentionally kept the agenda rather light for this meeting given the
late notice and anticipation that there will still be plenty to discuss
regarding literal encodings, NTBS, NTMBS, and C vs C++ vs
system/environment locales.
My rough notes from the 2025-09-25 discussion of P2019R7 are below for
reference.
- P2019R7: Thread attributes:
- Corentin introduced the paper.
- The thread name is provided to the OS so that it is available
for display in an OS monitor, debugger, process thread list, etc...
- On POSIX systems, the name is stuffed into a small string
buffer provided by the pthreads library and is generally interpreted
as execution encoding.
- Prior to Windows 10, there were unofficial ways to associate
a name with a thread.
- On Windows 10 there is a public interface.
- On Windows, the name must be provided in wchar_t; there is no
"ANSI" version of the interface.
- A previous revision of this paper supported both char and
wchar_t.
- A copy of the string will always be needed, so transcoding
costs aren't significant.
- We should support char8_t where we want to support Unicode,
but that isn't proposed in this paper; can add that later.
- The name is interpreted as an NTBS in the execution encoding.
- The name can be transcoded to wchar_t on Windows.
- If mojibake happens, it doesn't affect users of the software.
- Victor: Execution encoding isn't defined in the standard; we
have execution character set. Is this locale encoding?
- Victor: I think we should use a well-defined encoding like the
literal encoding.
- Corentin: We need a solution that works with
MultiByteToWideChar() on Windows; that only works with execution
encoding.
- Corentin: We don't require a relationship between literal
encoding and execution encoding; that technically means that
conversions don't work.
- Victor: This should be implementation detail.
- Tom: I'm torn, I agree with Victor, but I also think the
proposal reflects existing implementations.
- Tom: On POSIX systems, the string is likely to be interpreted
by other tools using the execution encoding.
- Tom: Should that NTBS be NTMBS?
- Corentin: Probably.
- Corentin: The proposal is consistent with behavior elsewhere in
the standard library.
- Tom: Is the name exposed in the thread class interface?
- Corentin: No; not all platforms expose it; we would have to
store it internally.
- Corentin: On some platforms, retrieving thread properties
requires a thread ID.
- Steve: If the name is not in the right encoding, you'll get
broken behavior in a predictable way.
- Victor: On POSIX, I think we should do what path does and
prohibit transcoding.
- Victor: On Windows, this is just broken because it uses C locale.
- Tom: Does NTBS imply C locale to you?
- Victor: Yes.
- Jens: Execution encoding is not a term in the C++ standard; we
have in [character.seq.general] "the encodings of the execution
character sets ... are locale specific"; we should use this wording.
- Corentin: We probably should define "execution encoding".
- Jens: Not in this paper.
- Jens: We should specify which locale we mean here.
- Corentin: No, not in each place where we refer to the execution
encoding.
- Jens: In the recent exception class discussions, we determined
that the encodings correspond to the C locale; we should be more
specific here.
- Corentin: If we just say NTMBS, then we get the right result.
- Tom: And that gets us C locale.
- Jens: For exception classes we say NTBS with a carve out for
NTMBS; do we want that here?
- Tom: If we could, I think we would prefer to require NTMBS for
the exception classes.
- Jens: <referring to exception class wording>; wording directs
to codecvt and thus C++ locale.
- Jens: So, do we want C or C++ locale here? It looks like NTMBS
comes in two flavors; we should be clear on the semantics.
- Jens: <referring to a link provided by Victor> regarding
SetThreadDescription(); there is a technique that involes use of a
char string.
- Corentin: That only works when a debugger is attached;
implementors won't use that technique.
- Jens: We need to decide what exactly we want this interface to
be compatible with.
- Jens: The level of complexity here is much less than for paths.
- Corentin: Can the C and C++ locales diverge?
- Jens: Yes.
- Eddie: This is similar to path; why not have the name_hint
constructor behave like path where we can provide the string in
multiple encodings.
- Corentin: Path is unique since filesystem encodings may differ
from the other encodings; it is more complicated.
- Corentin: We should avoid exposing programmers to encoding
concerns where we don't need to.
- Corentin: I removed char8_t support because I thought it would
increase consensus; if SG16 wants to re-introduce char8_t or require
literal encoding, I'm ok with that.
- Eddie: What I meant is that we could use the native()
implementation-defined character type.
- Corentin: I want to be able to pass an ordinary string literal
and have it work everywhere.
- Victor: I agree that we don't want most of the complexity of
path. However, path is a good abstract model of what we want here.
- Victor: On POSIX, the bytes should just be passed as is; a
binary identifier could be used if desired.
- Victor: NTMBS means you can't use std::format to produce the
thread name.
- Corentin: Would it increase consensus to require the ordinary
literal encoding for C++26?
- Victor: Yes.
- Jens: What does sprintf do?
- Tom: It uses the execution encoding so that special characters
in trailing code units are not misinterpreted.
- Jens: We don't require the literal encoding to match the
execution encoding though that might be a design bug.
- Jens: Though format is different, I don't think we should
sprint a new encoding requirement on programmers here.
- Jens: We could specify that, for POSIX, an NTBS and prohibit
conversions, add wchar_t for Windows, and then have portability issues.
- Corentin: I think printf() is the model to follow.
- Jens: Then make it an NTMBS and reference the C locale.
- Jens: The only conversion concern we have is which conversion
function is to be called on Windows.
- Corentin: The separation of the C and C++ locales is new
information to me.
- Tom: We could restrict the characters used to the basic literal
character set.
- Victor: I agree with the use of NTBS on POSIX, but I think
NTMBS is wrong for Windows as it is incompatible with everything.
- Poll 2: P2019R7: Name hint should be provided in the ordinary
literal encoding.
- Attendees: 8
- SF F N A SA
3 1 3 1 0
- Weak consensus.
- Poll 3: P2019R7: Name hint should be provided as an NTMBS in
the C locale encoding.
- Attendees: 8
- SF F N A SA
1 5 1 0 1
- Consensus.
- Victor: I don't think we're done with this paper; use of
string_view might be problematic.
- Jens: It is called name "hint" for a reason.
- Jens: If there are embedded nulls or gets truncated, tough luck.
Tom.
Received on 2025-01-22 05:33:07