Date: Wed, 24 Jan 2024 11:29:54 -0500
SG16 will hold a meeting on Wednesday, January 24th, at 19:30 UTC
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20240124T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
*That is today!* Yes, I continue to struggle to keep pace with the
world. No, I still have not published the minutes from the last meeting.
The agenda follows.
* P3045R0: Quantities and units library <https://wg21.link/p3045r0>
* CWG 2843: Undated reference to Unicode makes C++ a moving target
<https://cplusplus.github.io/CWG/issues/2843.html>
We discussed a draft of P3045 during the 2023-11-29 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#november-29th-2023>.
No polls were taken as that discussion was mostly introductory
presentation. Section 13 (Text output)
<https://wg21.link/p3045r0#text-output> discusses formatting and
character encoding considerations. The motivation and proposal for a
fixed_string type has been moved to a new paper that is yet to be
published; P3094 (std::basic_fixed_string) <https://wg21.link/p3094>.
Section 13.6 (Text output open questions)
<https://wg21.link/p3045r0#text-output-open-questions> has the following
list of questions and is what discussion will focus on today:
1. Which C++ character type should be used for symbols in Unicode encoding?
2. Are we OK with the usage of '_' for denoting a subscript identifier?
3. Are we OK with no text output support of quantity types?
4. Which character type should basic_symbol_text be used in a
single-argument constructor?
5. How to name a non-Unicode accessor member function (e.g., .ascii())?
The same name should consistently be used in text_encoding and in
the formatting grammar.
6. Should unit_symbol() return std::string_view or basic_fixed_string?
7. Do we care about ostreams enough to introduce custom manipulators to
format units?
8. What about the localization for units? Will we get something like
ICU in the C++ standard?
9. std::chrono::duration uses 'Q' and 'q' for a number and a unit. In
the grammar above, we proposed using 'N' and 'U' for them,
respectively. We also introduced 'D' for dimensions. Are we OK with
this?
10. Should we provide support for quantity points?
The 1st and 4th questions are, I think, the most important ones as they
directly impact both the user interface and the implementation. We need
to determine how to:
* Specify both default/preferred symbols (e.g., non-ASCII) and
compatibility/fallback symbols (e.g., text limited to the basic
literal character set). For example, "Ω" as a default/preferred
symbol for ohm with "ohm" as a compatibility/fallback. The paper has
a number of such examples (see dim_thermodynamic_temperature, ohm,
micro_, and hyperfine_structure_transition_frequency_of_cs in
section 13.1.1 (Symbol definition examples)
<https://wg21.link/p3045r0#symbol-definition-examples>)
* Specify these sets of symbols for each of the ordinary, wide, and
UTF character encodings.
o Should it be required to explicitly provide symbol text for each
of these encodings? Perhaps only when characters outside of the
basic literal character set are used? Perhaps:
named_unit<"s", ...> // Ok, uses "s" transcoded
as necessary for each of the encodings.
named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
transcoded as necessary for the compatibility/fallback symbol
and the provided text as the default/preferred symbol otherwise.
// This variant
would prohibit use of characters outside the basic literal
character set with the ordinary character encoding thus ensuring
portability.
named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
transcoded as necessary for the compatibility/fallback symbol
and the provided text as the default/preferred symbol
// otherwise
with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
// This
requires that the ordinary literal encoding be UTF-8 for the
code to be well-formed (see P1854 <https://wg21.link/p1854>).
The other questions will likely require a little introductory discussion
to better understand the context for the question.
If time permits, we'll continue discussion of CWG 2843 from the
2024-01-10 SG16 meeting (for which minutes are not yet published). I
believe there are three questions yet to be answered:
1. The version of the Unicode Standard to be specified as the minimum
version.
2. Whether implementations are allowed to use different
implementation-defined Unicode versions for the core language and
the standard library.
3. Whether the implementation-defined Unicode version should be exposed
via a new feature test macro (perhaps two new feature test macros
depending on the previous item).
Tom.
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20240124T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
*That is today!* Yes, I continue to struggle to keep pace with the
world. No, I still have not published the minutes from the last meeting.
The agenda follows.
* P3045R0: Quantities and units library <https://wg21.link/p3045r0>
* CWG 2843: Undated reference to Unicode makes C++ a moving target
<https://cplusplus.github.io/CWG/issues/2843.html>
We discussed a draft of P3045 during the 2023-11-29 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#november-29th-2023>.
No polls were taken as that discussion was mostly introductory
presentation. Section 13 (Text output)
<https://wg21.link/p3045r0#text-output> discusses formatting and
character encoding considerations. The motivation and proposal for a
fixed_string type has been moved to a new paper that is yet to be
published; P3094 (std::basic_fixed_string) <https://wg21.link/p3094>.
Section 13.6 (Text output open questions)
<https://wg21.link/p3045r0#text-output-open-questions> has the following
list of questions and is what discussion will focus on today:
1. Which C++ character type should be used for symbols in Unicode encoding?
2. Are we OK with the usage of '_' for denoting a subscript identifier?
3. Are we OK with no text output support of quantity types?
4. Which character type should basic_symbol_text be used in a
single-argument constructor?
5. How to name a non-Unicode accessor member function (e.g., .ascii())?
The same name should consistently be used in text_encoding and in
the formatting grammar.
6. Should unit_symbol() return std::string_view or basic_fixed_string?
7. Do we care about ostreams enough to introduce custom manipulators to
format units?
8. What about the localization for units? Will we get something like
ICU in the C++ standard?
9. std::chrono::duration uses 'Q' and 'q' for a number and a unit. In
the grammar above, we proposed using 'N' and 'U' for them,
respectively. We also introduced 'D' for dimensions. Are we OK with
this?
10. Should we provide support for quantity points?
The 1st and 4th questions are, I think, the most important ones as they
directly impact both the user interface and the implementation. We need
to determine how to:
* Specify both default/preferred symbols (e.g., non-ASCII) and
compatibility/fallback symbols (e.g., text limited to the basic
literal character set). For example, "Ω" as a default/preferred
symbol for ohm with "ohm" as a compatibility/fallback. The paper has
a number of such examples (see dim_thermodynamic_temperature, ohm,
micro_, and hyperfine_structure_transition_frequency_of_cs in
section 13.1.1 (Symbol definition examples)
<https://wg21.link/p3045r0#symbol-definition-examples>)
* Specify these sets of symbols for each of the ordinary, wide, and
UTF character encodings.
o Should it be required to explicitly provide symbol text for each
of these encodings? Perhaps only when characters outside of the
basic literal character set are used? Perhaps:
named_unit<"s", ...> // Ok, uses "s" transcoded
as necessary for each of the encodings.
named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
transcoded as necessary for the compatibility/fallback symbol
and the provided text as the default/preferred symbol otherwise.
// This variant
would prohibit use of characters outside the basic literal
character set with the ordinary character encoding thus ensuring
portability.
named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
transcoded as necessary for the compatibility/fallback symbol
and the provided text as the default/preferred symbol
// otherwise
with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
// This
requires that the ordinary literal encoding be UTF-8 for the
code to be well-formed (see P1854 <https://wg21.link/p1854>).
The other questions will likely require a little introductory discussion
to better understand the context for the question.
If time permits, we'll continue discussion of CWG 2843 from the
2024-01-10 SG16 meeting (for which minutes are not yet published). I
believe there are three questions yet to be answered:
1. The version of the Unicode Standard to be specified as the minimum
version.
2. Whether implementations are allowed to use different
implementation-defined Unicode versions for the core language and
the standard library.
3. Whether the implementation-defined Unicode version should be exposed
via a new feature test macro (perhaps two new feature test macros
depending on the previous item).
Tom.
Received on 2024-01-24 16:29:56