Date: Wed, 24 Jan 2024 13:34:49 -0500
On 1/24/24 12:51 PM, Eddie Nolan via SG16 wrote:
>
> With respect to unit symbols whose Unicode code points as units have
> canonical equivalents as Greek letters, this was previously brought up
> in the telecon on November 29, 2023 (minutes
> <https://github.com/sg16-unicode/sg16-meetings/blob/340862b721050dbae5d35c96d1e62ecde7525206/README-2023.md#november-29th-2023>),
> where I pointed out that the existing precedent in the standard is to
> use the unit version, since iostream formatting of
> |std::chrono::duration| uses |U+00B5 (MICRO SIGN)| rather than |U+03BC
> (GREEK SMALL LETTER MU)| for microseconds. (See
> [time.duration.io]p(1,5) <http://eel.is/c++draft/time.duration.io#1.5>).
>
> Given that precedent, I think we should be consistent with that and
> use |U+212B (ANGSTROM SIGN)| rather than |U+00C5 (LATIN CAPITAL LETTER
> A WITH RING ABOVE)|, and |U+2126 (OHM SIGN)| rather than |U+03A9
> (GREEK CAPITAL LETTER OMEGA)|.
>
> Does anyone know where we could find the minutes where the decision
> was made about which code points to use for |std::chrono::duration|
> microsecond formatting? That way we could get more insight into the
> original reasoning behind it. (Some cursory grep-ing through the
> |sg16-meetings| repo didn’t turn anything up).
>
I don't think SG16 ever discussed the choice of character used for
std::chrono::duration. Some spelunking suggests that the choice of
character first appeared in P0355R1 <https://wg21.link/p0355r1> (see the
reference to U+00B5 in section 20.17.5.10). That paper is from 2016 and
predates formation of SG16 by a couple of years.
Tom.
>
> On Wed, Jan 24, 2024 at 12:24 PM Alisdair Meredith via SG16
> <sg16_at_[hidden]> wrote:
>
> I will not be able to attend today.
>
> My only feedback would be that I do want feature macros to query
> for which
> version of Unicode is in effect at translation time, and I believe
> that is quite
> Important rather than nice-to-have.
>
> AlisdairM
>
> > On 24 Jan 2024, at 11:29, Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
> >
> > SG16 will hold a meeting on Wednesday, January 24th, at 19:30
> UTC (timezone conversion).
> > That is today! Yes, I continue to struggle to keep pace with the
> world. No, I still have not published the minutes from the last
> meeting.
> > The agenda follows.
> > • P3045R0: Quantities and units library
> > • CWG 2843: Undated reference to Unicode makes C++ a moving
> target
> > We discussed a draft of P3045 during the 2023-11-29 SG16
> meeting. No polls were taken as that discussion was mostly
> introductory presentation. Section 13 (Text output) discusses
> formatting and character encoding considerations. The motivation
> and proposal for a fixed_string type has been moved to a new paper
> that is yet to be published; P3094 (std::basic_fixed_string).
> Section 13.6 (Text output open questions) has the following list
> of questions and is what discussion will focus on today:
> > • Which C++ character type should be used for symbols in
> Unicode encoding?
> > • Are we OK with the usage of '_' for denoting a subscript
> identifier?
> > • Are we OK with no text output support of quantity types?
> > • Which character type should basic_symbol_text be used in a
> single-argument constructor?
> > • How to name a non-Unicode accessor member function (e.g.,
> .ascii())? The same name should consistently be used in
> text_encoding and in the formatting grammar.
> > • Should unit_symbol() return std::string_view or
> basic_fixed_string?
> > • Do we care about ostreams enough to introduce custom
> manipulators to format units?
> > • What about the localization for units? Will we get
> something like ICU in the C++ standard?
> > • std::chrono::duration uses 'Q' and 'q' for a number and a
> unit. In the grammar above, we proposed using 'N' and 'U' for
> them, respectively. We also introduced 'D' for dimensions. Are we
> OK with this?
> > • Should we provide support for quantity points?
> > The 1st and 4th questions are, I think, the most important ones
> as they directly impact both the user interface and the
> implementation. We need to determine how to:
> > • Specify both default/preferred symbols (e.g., non-ASCII)
> and compatibility/fallback symbols (e.g., text limited to the
> basic literal character set). For example, "Ω" as a
> default/preferred symbol for ohm with "ohm" as a
> compatibility/fallback. The paper has a number of such examples
> (see dim_thermodynamic_temperature, ohm, micro_, and
> hyperfine_structure_transition_frequency_of_cs in section 13.1.1
> (Symbol definition examples))
> > • Specify these sets of symbols for each of the ordinary,
> wide, and UTF character encodings.
> > • Should it be required to explicitly provide symbol
> text for each of these encodings? Perhaps only when characters
> outside of the basic literal character set are used? Perhaps:
> > named_unit<"s", ...> // Ok, uses "s"
> transcoded as necessary for each of the encodings.
> > named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and
> the provided text as the default/preferred symbol otherwise.
> > // This variant
> would prohibit use of characters outside the basic literal
> character set with the ordinary character encoding thus ensuring
> portability.
> > named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and
> the provided text as the default/preferred symbol
> > // otherwise
> with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
> > // This requires
> that the ordinary literal encoding be UTF-8 for the code to be
> well-formed (see P1854).
> > The other questions will likely require a little introductory
> discussion to better understand the context for the question.
> > If time permits, we'll continue discussion of CWG 2843 from the
> 2024-01-10 SG16 meeting (for which minutes are not yet published).
> I believe there are three questions yet to be answered:
> > • The version of the Unicode Standard to be specified as the
> minimum version.
> > • Whether implementations are allowed to use different
> implementation-defined Unicode versions for the core language and
> the standard library.
> > • Whether the implementation-defined Unicode version should
> be exposed via a new feature test macro (perhaps two new feature
> test macros depending on the previous item).
> > Tom.
> >
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
>
> With respect to unit symbols whose Unicode code points as units have
> canonical equivalents as Greek letters, this was previously brought up
> in the telecon on November 29, 2023 (minutes
> <https://github.com/sg16-unicode/sg16-meetings/blob/340862b721050dbae5d35c96d1e62ecde7525206/README-2023.md#november-29th-2023>),
> where I pointed out that the existing precedent in the standard is to
> use the unit version, since iostream formatting of
> |std::chrono::duration| uses |U+00B5 (MICRO SIGN)| rather than |U+03BC
> (GREEK SMALL LETTER MU)| for microseconds. (See
> [time.duration.io]p(1,5) <http://eel.is/c++draft/time.duration.io#1.5>).
>
> Given that precedent, I think we should be consistent with that and
> use |U+212B (ANGSTROM SIGN)| rather than |U+00C5 (LATIN CAPITAL LETTER
> A WITH RING ABOVE)|, and |U+2126 (OHM SIGN)| rather than |U+03A9
> (GREEK CAPITAL LETTER OMEGA)|.
>
> Does anyone know where we could find the minutes where the decision
> was made about which code points to use for |std::chrono::duration|
> microsecond formatting? That way we could get more insight into the
> original reasoning behind it. (Some cursory grep-ing through the
> |sg16-meetings| repo didn’t turn anything up).
>
I don't think SG16 ever discussed the choice of character used for
std::chrono::duration. Some spelunking suggests that the choice of
character first appeared in P0355R1 <https://wg21.link/p0355r1> (see the
reference to U+00B5 in section 20.17.5.10). That paper is from 2016 and
predates formation of SG16 by a couple of years.
Tom.
>
> On Wed, Jan 24, 2024 at 12:24 PM Alisdair Meredith via SG16
> <sg16_at_[hidden]> wrote:
>
> I will not be able to attend today.
>
> My only feedback would be that I do want feature macros to query
> for which
> version of Unicode is in effect at translation time, and I believe
> that is quite
> Important rather than nice-to-have.
>
> AlisdairM
>
> > On 24 Jan 2024, at 11:29, Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
> >
> > SG16 will hold a meeting on Wednesday, January 24th, at 19:30
> UTC (timezone conversion).
> > That is today! Yes, I continue to struggle to keep pace with the
> world. No, I still have not published the minutes from the last
> meeting.
> > The agenda follows.
> > • P3045R0: Quantities and units library
> > • CWG 2843: Undated reference to Unicode makes C++ a moving
> target
> > We discussed a draft of P3045 during the 2023-11-29 SG16
> meeting. No polls were taken as that discussion was mostly
> introductory presentation. Section 13 (Text output) discusses
> formatting and character encoding considerations. The motivation
> and proposal for a fixed_string type has been moved to a new paper
> that is yet to be published; P3094 (std::basic_fixed_string).
> Section 13.6 (Text output open questions) has the following list
> of questions and is what discussion will focus on today:
> > • Which C++ character type should be used for symbols in
> Unicode encoding?
> > • Are we OK with the usage of '_' for denoting a subscript
> identifier?
> > • Are we OK with no text output support of quantity types?
> > • Which character type should basic_symbol_text be used in a
> single-argument constructor?
> > • How to name a non-Unicode accessor member function (e.g.,
> .ascii())? The same name should consistently be used in
> text_encoding and in the formatting grammar.
> > • Should unit_symbol() return std::string_view or
> basic_fixed_string?
> > • Do we care about ostreams enough to introduce custom
> manipulators to format units?
> > • What about the localization for units? Will we get
> something like ICU in the C++ standard?
> > • std::chrono::duration uses 'Q' and 'q' for a number and a
> unit. In the grammar above, we proposed using 'N' and 'U' for
> them, respectively. We also introduced 'D' for dimensions. Are we
> OK with this?
> > • Should we provide support for quantity points?
> > The 1st and 4th questions are, I think, the most important ones
> as they directly impact both the user interface and the
> implementation. We need to determine how to:
> > • Specify both default/preferred symbols (e.g., non-ASCII)
> and compatibility/fallback symbols (e.g., text limited to the
> basic literal character set). For example, "Ω" as a
> default/preferred symbol for ohm with "ohm" as a
> compatibility/fallback. The paper has a number of such examples
> (see dim_thermodynamic_temperature, ohm, micro_, and
> hyperfine_structure_transition_frequency_of_cs in section 13.1.1
> (Symbol definition examples))
> > • Specify these sets of symbols for each of the ordinary,
> wide, and UTF character encodings.
> > • Should it be required to explicitly provide symbol
> text for each of these encodings? Perhaps only when characters
> outside of the basic literal character set are used? Perhaps:
> > named_unit<"s", ...> // Ok, uses "s"
> transcoded as necessary for each of the encodings.
> > named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and
> the provided text as the default/preferred symbol otherwise.
> > // This variant
> would prohibit use of characters outside the basic literal
> character set with the ordinary character encoding thus ensuring
> portability.
> > named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u"
> transcoded as necessary for the compatibility/fallback symbol and
> the provided text as the default/preferred symbol
> > // otherwise
> with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
> > // This requires
> that the ordinary literal encoding be UTF-8 for the code to be
> well-formed (see P1854).
> > The other questions will likely require a little introductory
> discussion to better understand the context for the question.
> > If time permits, we'll continue discussion of CWG 2843 from the
> 2024-01-10 SG16 meeting (for which minutes are not yet published).
> I believe there are three questions yet to be answered:
> > • The version of the Unicode Standard to be specified as the
> minimum version.
> > • Whether implementations are allowed to use different
> implementation-defined Unicode versions for the core language and
> the standard library.
> > • Whether the implementation-defined Unicode version should
> be exposed via a new feature test macro (perhaps two new feature
> test macros depending on the previous item).
> > Tom.
> >
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
Received on 2024-01-24 18:34:51