ISOCPP sg16 List: Re: P1949 negative impact on math heavy code

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 11 Aug 2022 11:50:06 -0400

Thank you for sharing your experience here, Dennis.

The responses shared by Peter Brett and Corentin Jabot reflect the SG16
consensus on these concerns; I'll try not to be repetitive here.

Mark Davis and Robin Leroy (both CC'd) have been leading the Unicode
Source Code Ad-Hoc Working Group that was formed earlier this year in
the wake of the Trojan Source <https://trojansource.codes/>
vulnerabilities reported last year. Status reports from this group can
be found here
<https://www.unicode.org/L2/L2022/22088-source-code-wg-report.pdf>
(April, 2022) and here
<https://www.unicode.org/L2/L2022/22161-source-code-wg-rept.pdf> (July,
2022). While the group has discussed the issue of which characters
should be re-considered as candidates for use in identifiers, a proposal
to modify UAX #31 <https://unicode.org/reports/tr31> has not yet been
made. I'm sure the group would appreciate hearing more from you. I'll
leave it to Mark and Robin to let you know how best to follow up with
their group.

Mark and Robin, perhaps we can discuss this issue again at an up-coming
meeting with the intent to create a plan to address it? Please note that
I'll miss today's meeting as I am on vacation and would face certain
annihilation were I to tell my wife that I was going to skip out on part
of our vacation to attend a Unicode meeting ;)

Tom.

On 8/11/22 7:48 AM, Ogiermann, Dennis via SG16 wrote:
> Dear SG16 members,
>
> I have been referred to this mailing list to share some concerns regarding the adoption of P1949. Some preliminary discussion can be found at a Github issue that I have openedhttps://github.com/sg16-unicode/sg16/issues/77 . Let me summarize this so far and respond here.
>
>
> Initial Post:
>
> In the scientific computing and numerical community the utilization of Unicode identifiers has gained a bit of popularity over the recent years, because it helps in keeping the code close to theory. Let us take for example this piece of math tex:
>
> $$
> \Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot \Delta t_{n}
> $$
>
> pre-P1949 we were able to implement this as
>
> ```
> const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) * std::pow(εₙ₋₁, β₃/k) * Δtₙ;
> ```
>
> which reads like the formulas, thus reducing mental overhead when writing code. I know, this is a simple and short example where it may not be directly clear that this is helpful, but once formulas start to get longer and the implementation spans in the order of hundreds of lines of code, than this can really simplify development, maintenance and debugging a lot.https://godbolt.org/z/nG7o5K141 is an example - I also want to note for completeness that MSVC seem to have not supported such constructs in first place.
>
> I appreciate the time to work on the standard, but I think this proposal is the exact opposite of what we developers in the numerical community need. [...] I also noticed that super-/subscript letters and some super-/subscript symbols seem to be still valid, causing some weird inconsistency. [...]
>
> The current direction also raises more questions from my side:
>
> 1. Are there plans to restrict allowed characters further, especially the ones used in the computational sciences/basic math notation?
> 2. Is there the possibility to bring back the numerical super-/subscripts?
> 3. Related to this, and I know that the unicode consortium does want to hear this, but since this is really useful for the numerical community, is there any possibility to at least allow the very basic standard letters (greek and latin) in super-/subscripts - either directly via unicode or some extra mechanism in the language/editors? Yes, I read the opinions on this (and I am absolutely not fan of it, less am I agreeing), but viewing this from user-perspective, having just some characters avaiblable is really weird.
>
>
> Jens Maurer and Tom Honermann has answered some of my initial questions in the linked Github issue, which I also quickly summarize:
> * Identifier definition has been outsource to the Unicode consortium (Unicode Standard Annex 31 defines a recommendation)
> * SG16 is at least aware of the problem mentioned above
> * A member of the Unicode Consortium is currently working on improvements to Unicode to support source code as text
> * This mailing list is the correct medium to discuss issues concerning SG16 (in contrast to the Github org and associated repos)
>
>
> Now, my understanding of Annex 31 is that it gives only recommendations for a formal grammar to describe identifiers, which is fine and I think also a necessary step. Also from my understanding P1949R7 is the corresponding paper describing how Annex 31 is adopted into the C++ standard. Trying to throw in something constructive, I think it might make sense to extend the alphabet for either XID_Start or XID_Continue with 2080..208E and 2090..209C. Also, including 00B2,00B3,00B9,2070,2071,2073..207E might make sense, but this may require more discussion, whether it such characters might be blocked for other purposes. However, my thoughts might be a bit short-sighted from a language evolution perspective, because it is merely a hotfix for what is helpful for writing math heavy code (e.g. 2090..209C is an incomplete character range).
>
>
> Thank you for your time and best regards,
> Dennis

Received on 2022-08-11 15:50:14