Date: Thu, 11 Aug 2022 11:48:42 +0000
Dear SG16 members,
I have been referred to this mailing list to share some concerns regarding the adoption of P1949. Some preliminary discussion can be found at a Github issue that I have opened https://github.com/sg16-unicode/sg16/issues/77 . Let me summarize this so far and respond here.
Initial Post:
In the scientific computing and numerical community the utilization of Unicode identifiers has gained a bit of popularity over the recent years, because it helps in keeping the code close to theory. Let us take for example this piece of math tex:
$$
\Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot \Delta t_{n}
$$
pre-P1949 we were able to implement this as
```
const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) * std::pow(εₙ₋₁, β₃/k) * Δtₙ;
```
which reads like the formulas, thus reducing mental overhead when writing code. I know, this is a simple and short example where it may not be directly clear that this is helpful, but once formulas start to get longer and the implementation spans in the order of hundreds of lines of code, than this can really simplify development, maintenance and debugging a lot. https://godbolt.org/z/nG7o5K141 is an example - I also want to note for completeness that MSVC seem to have not supported such constructs in first place.
I appreciate the time to work on the standard, but I think this proposal is the exact opposite of what we developers in the numerical community need. [...] I also noticed that super-/subscript letters and some super-/subscript symbols seem to be still valid, causing some weird inconsistency. [...]
The current direction also raises more questions from my side:
1. Are there plans to restrict allowed characters further, especially the ones used in the computational sciences/basic math notation?
2. Is there the possibility to bring back the numerical super-/subscripts?
3. Related to this, and I know that the unicode consortium does want to hear this, but since this is really useful for the numerical community, is there any possibility to at least allow the very basic standard letters (greek and latin) in super-/subscripts - either directly via unicode or some extra mechanism in the language/editors? Yes, I read the opinions on this (and I am absolutely not fan of it, less am I agreeing), but viewing this from user-perspective, having just some characters avaiblable is really weird.
Jens Maurer and Tom Honermann has answered some of my initial questions in the linked Github issue, which I also quickly summarize:
* Identifier definition has been outsource to the Unicode consortium (Unicode Standard Annex 31 defines a recommendation)
* SG16 is at least aware of the problem mentioned above
* A member of the Unicode Consortium is currently working on improvements to Unicode to support source code as text
* This mailing list is the correct medium to discuss issues concerning SG16 (in contrast to the Github org and associated repos)
Now, my understanding of Annex 31 is that it gives only recommendations for a formal grammar to describe identifiers, which is fine and I think also a necessary step. Also from my understanding P1949R7 is the corresponding paper describing how Annex 31 is adopted into the C++ standard. Trying to throw in something constructive, I think it might make sense to extend the alphabet for either XID_Start or XID_Continue with 2080..208E and 2090..209C. Also, including 00B2,00B3,00B9,2070,2071,2073..207E might make sense, but this may require more discussion, whether it such characters might be blocked for other purposes. However, my thoughts might be a bit short-sighted from a language evolution perspective, because it is merely a hotfix for what is helpful for writing math heavy code (e.g. 2090..209C is an incomplete character range).
Thank you for your time and best regards,
Dennis
I have been referred to this mailing list to share some concerns regarding the adoption of P1949. Some preliminary discussion can be found at a Github issue that I have opened https://github.com/sg16-unicode/sg16/issues/77 . Let me summarize this so far and respond here.
Initial Post:
In the scientific computing and numerical community the utilization of Unicode identifiers has gained a bit of popularity over the recent years, because it helps in keeping the code close to theory. Let us take for example this piece of math tex:
$$
\Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot \Delta t_{n}
$$
pre-P1949 we were able to implement this as
```
const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) * std::pow(εₙ₋₁, β₃/k) * Δtₙ;
```
which reads like the formulas, thus reducing mental overhead when writing code. I know, this is a simple and short example where it may not be directly clear that this is helpful, but once formulas start to get longer and the implementation spans in the order of hundreds of lines of code, than this can really simplify development, maintenance and debugging a lot. https://godbolt.org/z/nG7o5K141 is an example - I also want to note for completeness that MSVC seem to have not supported such constructs in first place.
I appreciate the time to work on the standard, but I think this proposal is the exact opposite of what we developers in the numerical community need. [...] I also noticed that super-/subscript letters and some super-/subscript symbols seem to be still valid, causing some weird inconsistency. [...]
The current direction also raises more questions from my side:
1. Are there plans to restrict allowed characters further, especially the ones used in the computational sciences/basic math notation?
2. Is there the possibility to bring back the numerical super-/subscripts?
3. Related to this, and I know that the unicode consortium does want to hear this, but since this is really useful for the numerical community, is there any possibility to at least allow the very basic standard letters (greek and latin) in super-/subscripts - either directly via unicode or some extra mechanism in the language/editors? Yes, I read the opinions on this (and I am absolutely not fan of it, less am I agreeing), but viewing this from user-perspective, having just some characters avaiblable is really weird.
Jens Maurer and Tom Honermann has answered some of my initial questions in the linked Github issue, which I also quickly summarize:
* Identifier definition has been outsource to the Unicode consortium (Unicode Standard Annex 31 defines a recommendation)
* SG16 is at least aware of the problem mentioned above
* A member of the Unicode Consortium is currently working on improvements to Unicode to support source code as text
* This mailing list is the correct medium to discuss issues concerning SG16 (in contrast to the Github org and associated repos)
Now, my understanding of Annex 31 is that it gives only recommendations for a formal grammar to describe identifiers, which is fine and I think also a necessary step. Also from my understanding P1949R7 is the corresponding paper describing how Annex 31 is adopted into the C++ standard. Trying to throw in something constructive, I think it might make sense to extend the alphabet for either XID_Start or XID_Continue with 2080..208E and 2090..209C. Also, including 00B2,00B3,00B9,2070,2071,2073..207E might make sense, but this may require more discussion, whether it such characters might be blocked for other purposes. However, my thoughts might be a bit short-sighted from a language evolution perspective, because it is merely a hotfix for what is helpful for writing math heavy code (e.g. 2090..209C is an incomplete character range).
Thank you for your time and best regards,
Dennis
Received on 2022-08-11 11:48:46