Date: Wed, 24 Jan 2024 12:51:46 -0500
With respect to unit symbols whose Unicode code points as units have
canonical equivalents as Greek letters, this was previously brought up in
the telecon on November 29, 2023 (minutes
<https://github.com/sg16-unicode/sg16-meetings/blob/340862b721050dbae5d35c96d1e62ecde7525206/README-2023.md#november-29th-2023>),
where I pointed out that the existing precedent in the standard is to use
the unit version, since iostream formatting of std::chrono::duration
uses U+00B5
(MICRO SIGN) rather than U+03BC (GREEK SMALL LETTER MU) for microseconds.
(See [time.duration.io]p(1,5) <http://eel.is/c++draft/time.duration.io#1.5>
).
Given that precedent, I think we should be consistent with that and use U+212B
(ANGSTROM SIGN) rather than U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE),
and U+2126 (OHM SIGN) rather than U+03A9 (GREEK CAPITAL LETTER OMEGA).
Does anyone know where we could find the minutes where the decision was
made about which code points to use for std::chrono::duration microsecond
formatting? That way we could get more insight into the original reasoning
behind it. (Some cursory grep-ing through the sg16-meetings repo didn’t
turn anything up).
On Wed, Jan 24, 2024 at 12:24 PM Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:
> I will not be able to attend today.
>
> My only feedback would be that I do want feature macros to query for which
> version of Unicode is in effect at translation time, and I believe that is
> quite
> Important rather than nice-to-have.
>
> AlisdairM
>
> > On 24 Jan 2024, at 11:29, Tom Honermann via SG16 <sg16_at_[hidden]>
> wrote:
> >
> > SG16 will hold a meeting on Wednesday, January 24th, at 19:30 UTC
> (timezone conversion).
> > That is today! Yes, I continue to struggle to keep pace with the world.
> No, I still have not published the minutes from the last meeting.
> > The agenda follows.
> > • P3045R0: Quantities and units library
> > • CWG 2843: Undated reference to Unicode makes C++ a moving target
> > We discussed a draft of P3045 during the 2023-11-29 SG16 meeting. No
> polls were taken as that discussion was mostly introductory presentation.
> Section 13 (Text output) discusses formatting and character encoding
> considerations. The motivation and proposal for a fixed_string type has
> been moved to a new paper that is yet to be published; P3094
> (std::basic_fixed_string). Section 13.6 (Text output open questions) has
> the following list of questions and is what discussion will focus on today:
> > • Which C++ character type should be used for symbols in Unicode
> encoding?
> > • Are we OK with the usage of '_' for denoting a subscript
> identifier?
> > • Are we OK with no text output support of quantity types?
> > • Which character type should basic_symbol_text be used in a
> single-argument constructor?
> > • How to name a non-Unicode accessor member function (e.g.,
> .ascii())? The same name should consistently be used in text_encoding and
> in the formatting grammar.
> > • Should unit_symbol() return std::string_view or basic_fixed_string?
> > • Do we care about ostreams enough to introduce custom manipulators
> to format units?
> > • What about the localization for units? Will we get something like
> ICU in the C++ standard?
> > • std::chrono::duration uses 'Q' and 'q' for a number and a unit. In
> the grammar above, we proposed using 'N' and 'U' for them, respectively. We
> also introduced 'D' for dimensions. Are we OK with this?
> > • Should we provide support for quantity points?
> > The 1st and 4th questions are, I think, the most important ones as they
> directly impact both the user interface and the implementation. We need to
> determine how to:
> > • Specify both default/preferred symbols (e.g., non-ASCII) and
> compatibility/fallback symbols (e.g., text limited to the basic literal
> character set). For example, "Ω" as a default/preferred symbol for ohm with
> "ohm" as a compatibility/fallback. The paper has a number of such examples
> (see dim_thermodynamic_temperature, ohm, micro_, and
> hyperfine_structure_transition_frequency_of_cs in section 13.1.1 (Symbol
> definition examples))
> > • Specify these sets of symbols for each of the ordinary, wide, and
> UTF character encodings.
> > • Should it be required to explicitly provide symbol text for
> each of these encodings? Perhaps only when characters outside of the basic
> literal character set are used? Perhaps:
> > named_unit<"s", ...> // Ok, uses "s"
> transcoded as necessary for each of the encodings.
> > named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and the
> provided text as the default/preferred symbol otherwise.
> > // This variant would
> prohibit use of characters outside the basic literal character set with the
> ordinary character encoding thus ensuring portability.
> > named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and the
> provided text as the default/preferred symbol
> > // otherwise with the
> UTF-32 text converted to UTF-8 and UTF-16 as necessary.
> > // This requires that
> the ordinary literal encoding be UTF-8 for the code to be well-formed (see
> P1854).
> > The other questions will likely require a little introductory discussion
> to better understand the context for the question.
> > If time permits, we'll continue discussion of CWG 2843 from the
> 2024-01-10 SG16 meeting (for which minutes are not yet published). I
> believe there are three questions yet to be answered:
> > • The version of the Unicode Standard to be specified as the minimum
> version.
> > • Whether implementations are allowed to use different
> implementation-defined Unicode versions for the core language and the
> standard library.
> > • Whether the implementation-defined Unicode version should be
> exposed via a new feature test macro (perhaps two new feature test macros
> depending on the previous item).
> > Tom.
> >
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
canonical equivalents as Greek letters, this was previously brought up in
the telecon on November 29, 2023 (minutes
<https://github.com/sg16-unicode/sg16-meetings/blob/340862b721050dbae5d35c96d1e62ecde7525206/README-2023.md#november-29th-2023>),
where I pointed out that the existing precedent in the standard is to use
the unit version, since iostream formatting of std::chrono::duration
uses U+00B5
(MICRO SIGN) rather than U+03BC (GREEK SMALL LETTER MU) for microseconds.
(See [time.duration.io]p(1,5) <http://eel.is/c++draft/time.duration.io#1.5>
).
Given that precedent, I think we should be consistent with that and use U+212B
(ANGSTROM SIGN) rather than U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE),
and U+2126 (OHM SIGN) rather than U+03A9 (GREEK CAPITAL LETTER OMEGA).
Does anyone know where we could find the minutes where the decision was
made about which code points to use for std::chrono::duration microsecond
formatting? That way we could get more insight into the original reasoning
behind it. (Some cursory grep-ing through the sg16-meetings repo didn’t
turn anything up).
On Wed, Jan 24, 2024 at 12:24 PM Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:
> I will not be able to attend today.
>
> My only feedback would be that I do want feature macros to query for which
> version of Unicode is in effect at translation time, and I believe that is
> quite
> Important rather than nice-to-have.
>
> AlisdairM
>
> > On 24 Jan 2024, at 11:29, Tom Honermann via SG16 <sg16_at_[hidden]>
> wrote:
> >
> > SG16 will hold a meeting on Wednesday, January 24th, at 19:30 UTC
> (timezone conversion).
> > That is today! Yes, I continue to struggle to keep pace with the world.
> No, I still have not published the minutes from the last meeting.
> > The agenda follows.
> > • P3045R0: Quantities and units library
> > • CWG 2843: Undated reference to Unicode makes C++ a moving target
> > We discussed a draft of P3045 during the 2023-11-29 SG16 meeting. No
> polls were taken as that discussion was mostly introductory presentation.
> Section 13 (Text output) discusses formatting and character encoding
> considerations. The motivation and proposal for a fixed_string type has
> been moved to a new paper that is yet to be published; P3094
> (std::basic_fixed_string). Section 13.6 (Text output open questions) has
> the following list of questions and is what discussion will focus on today:
> > • Which C++ character type should be used for symbols in Unicode
> encoding?
> > • Are we OK with the usage of '_' for denoting a subscript
> identifier?
> > • Are we OK with no text output support of quantity types?
> > • Which character type should basic_symbol_text be used in a
> single-argument constructor?
> > • How to name a non-Unicode accessor member function (e.g.,
> .ascii())? The same name should consistently be used in text_encoding and
> in the formatting grammar.
> > • Should unit_symbol() return std::string_view or basic_fixed_string?
> > • Do we care about ostreams enough to introduce custom manipulators
> to format units?
> > • What about the localization for units? Will we get something like
> ICU in the C++ standard?
> > • std::chrono::duration uses 'Q' and 'q' for a number and a unit. In
> the grammar above, we proposed using 'N' and 'U' for them, respectively. We
> also introduced 'D' for dimensions. Are we OK with this?
> > • Should we provide support for quantity points?
> > The 1st and 4th questions are, I think, the most important ones as they
> directly impact both the user interface and the implementation. We need to
> determine how to:
> > • Specify both default/preferred symbols (e.g., non-ASCII) and
> compatibility/fallback symbols (e.g., text limited to the basic literal
> character set). For example, "Ω" as a default/preferred symbol for ohm with
> "ohm" as a compatibility/fallback. The paper has a number of such examples
> (see dim_thermodynamic_temperature, ohm, micro_, and
> hyperfine_structure_transition_frequency_of_cs in section 13.1.1 (Symbol
> definition examples))
> > • Specify these sets of symbols for each of the ordinary, wide, and
> UTF character encodings.
> > • Should it be required to explicitly provide symbol text for
> each of these encodings? Perhaps only when characters outside of the basic
> literal character set are used? Perhaps:
> > named_unit<"s", ...> // Ok, uses "s"
> transcoded as necessary for each of the encodings.
> > named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and the
> provided text as the default/preferred symbol otherwise.
> > // This variant would
> prohibit use of characters outside the basic literal character set with the
> ordinary character encoding thus ensuring portability.
> > named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and the
> provided text as the default/preferred symbol
> > // otherwise with the
> UTF-32 text converted to UTF-8 and UTF-16 as necessary.
> > // This requires that
> the ordinary literal encoding be UTF-8 for the code to be well-formed (see
> P1854).
> > The other questions will likely require a little introductory discussion
> to better understand the context for the question.
> > If time permits, we'll continue discussion of CWG 2843 from the
> 2024-01-10 SG16 meeting (for which minutes are not yet published). I
> believe there are three questions yet to be answered:
> > • The version of the Unicode Standard to be specified as the minimum
> version.
> > • Whether implementations are allowed to use different
> implementation-defined Unicode versions for the core language and the
> standard library.
> > • Whether the implementation-defined Unicode version should be
> exposed via a new feature test macro (perhaps two new feature test macros
> depending on the previous item).
> > Tom.
> >
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2024-01-24 17:51:59
