Date: Wed, 24 Jan 2024 21:34:00 +0000
Hi, sorry for having missed this,
I would have loved to participate in this discussion, I had done some research on some of these topics.
Please take this email with a little bit of salt, some of the criticism may sound harsh, but it is not personal, please don’t take it personal, I’m really trying to help constructively.
I myself have written a units library (you can find it here: https://github.com/tmiguelf/unit), (mostly as a personal project, although it derives from a professional application),
granted not as flashy or as well documented, some of the concepts I would have changed if I were to write it again today...
Anyways, I have looked at many different implementations of many different “units” like libraries each have their own implementation quirks.”
There seems to be the temptation for the authors to adopt their personal pet projects as a standard whit all of its quirks instead of finding a solution that doesn’t have “quirks”.
One of the most common mistakes I have seen made is to treat units like degrees centigrade like any other units.
This is a short list of some of the features of this particular unit:
* You can’t multiply it by a scalar.
* You can’t add it with itself.
* You can’t combine it with any other unit or itself to form a new unit.
* You can subtract 2 values in degrees Celsius but the resulting unit is not degrees Celsius, the unit is Kelvin
It is not just a simple quirk if a library fails to account for this, it is bug, it is actually more than just bugged, it is fundamentally broken.
I am aware of the remark in iso 80000-5 regarding degrees Celsius, and I quote:
“The unit degree Celsius is a special name for
the kelvin for use in stating values of Celsius
temperature. The unit degree Celsius is by
definition equal in magnitude to the kelvin. A
difference or interval of temperature may be
expressed in kelvin or in degrees Celsius.”
is completely bonkers, every single statement in that sentence is wrong, it makes absolutely no sense from a dimensional analysis perspective, it makes no physical sense, it is mathematically incoherent.
The only correct way to deal with a unit like this is to give its special type, and if it needs to do anything it needs to be converted into Kelvin, and if that is not how it works, I will be able to give you an example on how to get it to do the wrong thing no matter what you try.
It is not just a quirky, its wrong!
This also makes the concept of “quantity point” kind of broken, allot of the justification around it uses concepts that are wrong, it makes a mess of the definition of points and vectors, it is just mathematically wrong.
The author probably meant a “reference frame”, but the way it is handled is not sound.
If you want, we can have a quick call to explain this in more detail, please take into consideration when setting up the standard.
Unicode is the least of its problems.
Please let’s have a talk.
Br,
Tiago
From: SG16 <sg16-bounces_at_[hidden]p.org> On Behalf Of Tom Honermann via SG16
Sent: Wednesday, 24 January 2024 17:30
To: SG16 <sg16_at_[hidden]g>; Mateusz Pusz <mateusz.pusz_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]>
Subject: [SG16] Agenda for the 2024-01-24 SG16 meeting
SG16 will hold a meeting on Wednesday, January 24th, at 19:30 UTC (timezone conversion<https://www.timeanddate.com/worldclock/converter.html?iso=20240124T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
That is today! Yes, I continue to struggle to keep pace with the world. No, I still have not published the minutes from the last meeting.
The agenda follows.
* P3045R0: Quantities and units library<https://wg21.link/p3045r0>
* CWG 2843: Undated reference to Unicode makes C++ a moving target<https://cplusplus.github.io/CWG/issues/2843.html>
We discussed a draft of P3045 during the 2023-11-29 SG16 meeting<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#november-29th-2023>. No polls were taken as that discussion was mostly introductory presentation. Section 13 (Text output)<https://wg21.link/p3045r0#text-output> discusses formatting and character encoding considerations. The motivation and proposal for a fixed_string type has been moved to a new paper that is yet to be published; P3094 (std::basic_fixed_string)<https://wg21.link/p3094>. Section 13.6 (Text output open questions)<https://wg21.link/p3045r0#text-output-open-questions> has the following list of questions and is what discussion will focus on today:
1. Which C++ character type should be used for symbols in Unicode encoding?
2. Are we OK with the usage of '_' for denoting a subscript identifier?
3. Are we OK with no text output support of quantity types?
4. Which character type should basic_symbol_text be used in a single-argument constructor?
5. How to name a non-Unicode accessor member function (e.g., .ascii())? The same name should consistently be used in text_encoding and in the formatting grammar.
6. Should unit_symbol() return std::string_view or basic_fixed_string?
7. Do we care about ostreams enough to introduce custom manipulators to format units?
8. What about the localization for units? Will we get something like ICU in the C++ standard?
9. std::chrono::duration uses 'Q' and 'q' for a number and a unit. In the grammar above, we proposed using 'N' and 'U' for them, respectively. We also introduced 'D' for dimensions. Are we OK with this?
10. Should we provide support for quantity points?
The 1st and 4th questions are, I think, the most important ones as they directly impact both the user interface and the implementation. We need to determine how to:
* Specify both default/preferred symbols (e.g., non-ASCII) and compatibility/fallback symbols (e.g., text limited to the basic literal character set). For example, "Ω" as a default/preferred symbol for ohm with "ohm" as a compatibility/fallback. The paper has a number of such examples (see dim_thermodynamic_temperature, ohm, micro_, and hyperfine_structure_transition_frequency_of_cs in section 13.1.1 (Symbol definition examples)<https://wg21.link/p3045r0#symbol-definition-examples>)
* Specify these sets of symbols for each of the ordinary, wide, and UTF character encodings.
* Should it be required to explicitly provide symbol text for each of these encodings? Perhaps only when characters outside of the basic literal character set are used? Perhaps:
named_unit<"s", ...> // Ok, uses "s" transcoded as necessary for each of the encodings.
named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u" transcoded as necessary for the compatibility/fallback symbol and the provided text as the default/preferred symbol otherwise.
// This variant would prohibit use of characters outside the basic literal character set with the ordinary character encoding thus ensuring portability.
named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u" transcoded as necessary for the compatibility/fallback symbol and the provided text as the default/preferred symbol
// otherwise with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
// This requires that the ordinary literal encoding be UTF-8 for the code to be well-formed (see P1854<https://wg21.link/p1854>).
The other questions will likely require a little introductory discussion to better understand the context for the question.
If time permits, we'll continue discussion of CWG 2843 from the 2024-01-10 SG16 meeting (for which minutes are not yet published). I believe there are three questions yet to be answered:
1. The version of the Unicode Standard to be specified as the minimum version.
2. Whether implementations are allowed to use different implementation-defined Unicode versions for the core language and the standard library.
3. Whether the implementation-defined Unicode version should be exposed via a new feature test macro (perhaps two new feature test macros depending on the previous item).
Tom.
I would have loved to participate in this discussion, I had done some research on some of these topics.
Please take this email with a little bit of salt, some of the criticism may sound harsh, but it is not personal, please don’t take it personal, I’m really trying to help constructively.
I myself have written a units library (you can find it here: https://github.com/tmiguelf/unit), (mostly as a personal project, although it derives from a professional application),
granted not as flashy or as well documented, some of the concepts I would have changed if I were to write it again today...
Anyways, I have looked at many different implementations of many different “units” like libraries each have their own implementation quirks.”
There seems to be the temptation for the authors to adopt their personal pet projects as a standard whit all of its quirks instead of finding a solution that doesn’t have “quirks”.
One of the most common mistakes I have seen made is to treat units like degrees centigrade like any other units.
This is a short list of some of the features of this particular unit:
* You can’t multiply it by a scalar.
* You can’t add it with itself.
* You can’t combine it with any other unit or itself to form a new unit.
* You can subtract 2 values in degrees Celsius but the resulting unit is not degrees Celsius, the unit is Kelvin
It is not just a simple quirk if a library fails to account for this, it is bug, it is actually more than just bugged, it is fundamentally broken.
I am aware of the remark in iso 80000-5 regarding degrees Celsius, and I quote:
“The unit degree Celsius is a special name for
the kelvin for use in stating values of Celsius
temperature. The unit degree Celsius is by
definition equal in magnitude to the kelvin. A
difference or interval of temperature may be
expressed in kelvin or in degrees Celsius.”
is completely bonkers, every single statement in that sentence is wrong, it makes absolutely no sense from a dimensional analysis perspective, it makes no physical sense, it is mathematically incoherent.
The only correct way to deal with a unit like this is to give its special type, and if it needs to do anything it needs to be converted into Kelvin, and if that is not how it works, I will be able to give you an example on how to get it to do the wrong thing no matter what you try.
It is not just a quirky, its wrong!
This also makes the concept of “quantity point” kind of broken, allot of the justification around it uses concepts that are wrong, it makes a mess of the definition of points and vectors, it is just mathematically wrong.
The author probably meant a “reference frame”, but the way it is handled is not sound.
If you want, we can have a quick call to explain this in more detail, please take into consideration when setting up the standard.
Unicode is the least of its problems.
Please let’s have a talk.
Br,
Tiago
From: SG16 <sg16-bounces_at_[hidden]p.org> On Behalf Of Tom Honermann via SG16
Sent: Wednesday, 24 January 2024 17:30
To: SG16 <sg16_at_[hidden]g>; Mateusz Pusz <mateusz.pusz_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]>
Subject: [SG16] Agenda for the 2024-01-24 SG16 meeting
SG16 will hold a meeting on Wednesday, January 24th, at 19:30 UTC (timezone conversion<https://www.timeanddate.com/worldclock/converter.html?iso=20240124T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
That is today! Yes, I continue to struggle to keep pace with the world. No, I still have not published the minutes from the last meeting.
The agenda follows.
* P3045R0: Quantities and units library<https://wg21.link/p3045r0>
* CWG 2843: Undated reference to Unicode makes C++ a moving target<https://cplusplus.github.io/CWG/issues/2843.html>
We discussed a draft of P3045 during the 2023-11-29 SG16 meeting<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2023.md#november-29th-2023>. No polls were taken as that discussion was mostly introductory presentation. Section 13 (Text output)<https://wg21.link/p3045r0#text-output> discusses formatting and character encoding considerations. The motivation and proposal for a fixed_string type has been moved to a new paper that is yet to be published; P3094 (std::basic_fixed_string)<https://wg21.link/p3094>. Section 13.6 (Text output open questions)<https://wg21.link/p3045r0#text-output-open-questions> has the following list of questions and is what discussion will focus on today:
1. Which C++ character type should be used for symbols in Unicode encoding?
2. Are we OK with the usage of '_' for denoting a subscript identifier?
3. Are we OK with no text output support of quantity types?
4. Which character type should basic_symbol_text be used in a single-argument constructor?
5. How to name a non-Unicode accessor member function (e.g., .ascii())? The same name should consistently be used in text_encoding and in the formatting grammar.
6. Should unit_symbol() return std::string_view or basic_fixed_string?
7. Do we care about ostreams enough to introduce custom manipulators to format units?
8. What about the localization for units? Will we get something like ICU in the C++ standard?
9. std::chrono::duration uses 'Q' and 'q' for a number and a unit. In the grammar above, we proposed using 'N' and 'U' for them, respectively. We also introduced 'D' for dimensions. Are we OK with this?
10. Should we provide support for quantity points?
The 1st and 4th questions are, I think, the most important ones as they directly impact both the user interface and the implementation. We need to determine how to:
* Specify both default/preferred symbols (e.g., non-ASCII) and compatibility/fallback symbols (e.g., text limited to the basic literal character set). For example, "Ω" as a default/preferred symbol for ohm with "ohm" as a compatibility/fallback. The paper has a number of such examples (see dim_thermodynamic_temperature, ohm, micro_, and hyperfine_structure_transition_frequency_of_cs in section 13.1.1 (Symbol definition examples)<https://wg21.link/p3045r0#symbol-definition-examples>)
* Specify these sets of symbols for each of the ordinary, wide, and UTF character encodings.
* Should it be required to explicitly provide symbol text for each of these encodings? Perhaps only when characters outside of the basic literal character set are used? Perhaps:
named_unit<"s", ...> // Ok, uses "s" transcoded as necessary for each of the encodings.
named_unit<{"u", L"u", u8"Ω", u"Ω", U"Ω"}, ...> // Ok, uses "u" transcoded as necessary for the compatibility/fallback symbol and the provided text as the default/preferred symbol otherwise.
// This variant would prohibit use of characters outside the basic literal character set with the ordinary character encoding thus ensuring portability.
named_unit<{"u", "Ω", L"Ω", U"Ω"}, ...> // Ok, uses "u" transcoded as necessary for the compatibility/fallback symbol and the provided text as the default/preferred symbol
// otherwise with the UTF-32 text converted to UTF-8 and UTF-16 as necessary.
// This requires that the ordinary literal encoding be UTF-8 for the code to be well-formed (see P1854<https://wg21.link/p1854>).
The other questions will likely require a little introductory discussion to better understand the context for the question.
If time permits, we'll continue discussion of CWG 2843 from the 2024-01-10 SG16 meeting (for which minutes are not yet published). I believe there are three questions yet to be answered:
1. The version of the Unicode Standard to be specified as the minimum version.
2. Whether implementations are allowed to use different implementation-defined Unicode versions for the core language and the standard library.
3. Whether the implementation-defined Unicode version should be exposed via a new feature test macro (perhaps two new feature test macros depending on the previous item).
Tom.
Received on 2024-01-24 21:34:03