ISOCPP sg16 List: Re: P1949 negative impact on math heavy code

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 11 Aug 2022 12:07:38 -0400

On 8/11/22 9:43 AM, Corentin Jabot via SG16 wrote:
>
>
> On Thu, Aug 11, 2022, 14:59 Ogiermann, Dennis via SG16
> <sg16_at_[hidden]> wrote:
>
> Dear SG16 members,
>
> I have been referred to this mailing list to share some concerns
> regarding the adoption of P1949. Some preliminary discussion can
> be found at a Github issue that I have opened
> https://github.com/sg16-unicode/sg16/issues/77 . Let me summarize
> this so far and respond here.
>
>
> Initial Post:
>
> In the scientific computing and numerical community the
> utilization of Unicode identifiers has gained a bit of popularity
> over the recent years, because it helps in keeping the code close
> to theory. Let us take for example this piece of math tex:
>
> $$
> \Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot
> \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k}
> \cdot \Delta t_{n}
> $$
>
> pre-P1949 we were able to implement this as
>
> ```
> const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) *
> std::pow(εₙ₋₁, β₃/k) * Δtₙ;
> ```
>
> which reads like the formulas, thus reducing mental overhead when
> writing code. I know, this is a simple and short example where it
> may not be directly clear that this is helpful, but once formulas
> start to get longer and the implementation spans in the order of
> hundreds of lines of code, than this can really simplify
> development, maintenance and debugging a lot.
> https://godbolt.org/z/nG7o5K141 is an example - I also want to
> note for completeness that MSVC seem to have not supported such
> constructs in first place.
>
> I appreciate the time to work on the standard, but I think this
> proposal is the exact opposite of what we developers in the
> numerical community need. [...] I also noticed that
> super-/subscript letters and some super-/subscript symbols seem to
> be still valid, causing some weird inconsistency. [...]
>
> The current direction also raises more questions from my side:
>
> 1. Are there plans to restrict allowed characters further,
> especially the ones used in the computational sciences/basic math
> notation?
>
>
> No. The current set of character is stable and can be extended over
> time but not reduced. This is a very important property of the current
> design.
>
> The issue is that mathematical symbols used to work by accident as the
> allowed ranges were huge and not necessarily well understood, and as
> such were affected by the evolution of Unicode.
> Whereas now the characters allowed will always be considered valid in
> identifiers by Unicode.
>
>
>
> 2. Is there the possibility to bring back the numerical
> super-/subscripts?
> 3. Related to this, and I know that the unicode consortium
> does want to hear this, but since this is really useful for the
> numerical community, is there any possibility to at least allow
> the very basic standard letters (greek and latin) in
> super-/subscripts - either directly via unicode or some extra
> mechanism in the language/editors? Yes, I read the opinions on
> this (and I am absolutely not fan of it, less am I agreeing), but
> viewing this from user-perspective, having just some characters
> avaiblable is really weird.
>
>
> I do not think C++ should consider proposal of hastly subsetted ranges
> of Unicode because someone can find them useful.
> It's what landed us in this situation in the first place. We simply do
> not have the expertise to try to do the Unicode consurtium's jobs.
>
> The current state of things also allows us to have portability across
> C and C++, python, rust, etc.
> This is an important property that we should do our best to keep.
>
> So if you can convince the Unicode consurtium of the benefit of doing
> that, it's work that will automatically be applied across the industry.
>
> With the caveat that languages which support custom Unicode operators
> may reserve mathematical symbols for these, and so something currently
> a mathematical symbol can't probably become an identifier. which is
> another thing to consider: by extending what is considered an
> identifier, and saying that in particular mathematical symbols are
> identifiers, we would loose the design space of custom operators,
> which cannot be done lightly.
>
> And we certainly would loose portability between C and Swift, which
> seems undesirable

Note that Swift has not (yet) switched from UAX #31 Immutable
Identifiers
<https://unicode.org/reports/tr31/#Immutable_Identifier_Syntax> to
Default Identifiers
<https://unicode.org/reports/tr31/#Default_Identifier_Syntax>. C has for
C23 via the adoption of N2836 (C Identifier Syntax using Unicode
Standard Annex 31)
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2836.pdf>.

Swift also allows programmers to declare custom operators and a range of
Unicode code points has been defined for that purpose
<https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html#grammar_operator-head>.
I haven't researched how those ranges might overlap with characters that
might also be desirable for use in identifiers. I'm not sure if someone
else has, but support for these characters in custom operators might
complicate extending characters allowed in identifiers. Ideally,
programming languages would agree on what characters are and are not
allowed in both identifiers and operators.

Tom.

>
>
> In short, as Peter suggested, i would recommend you reach out to the
> unicode consurtium, they have more expertise than C++ in regard to
> extending the set of identifiers
> https://unicode.org/consortium/distlist.html
>
>
>
> Jens Maurer and Tom Honermann has answered some of my initial
> questions in the linked Github issue, which I also quickly summarize:
> * Identifier definition has been outsource to the Unicode
> consortium (Unicode Standard Annex 31 defines a recommendation)
> * SG16 is at least aware of the problem mentioned above
> * A member of the Unicode Consortium is currently working on
> improvements to Unicode to support source code as text
> * This mailing list is the correct medium to discuss issues
> concerning SG16 (in contrast to the Github org and associated repos)
>
>
> Now, my understanding of Annex 31 is that it gives only
> recommendations for a formal grammar to describe identifiers,
> which is fine and I think also a necessary step. Also from my
> understanding P1949R7 is the corresponding paper describing how
> Annex 31 is adopted into the C++ standard. Trying to throw in
> something constructive, I think it might make sense to extend the
> alphabet for either XID_Start or XID_Continue with 2080..208E and
> 2090..209C. Also, including 00B2,00B3,00B9,2070,2071,2073..207E
> might make sense, but this may require more discussion, whether it
> such characters might be blocked for other purposes. However, my
> thoughts might be a bit short-sighted from a language evolution
> perspective, because it is merely a hotfix for what is helpful for
> writing math heavy code (e.g. 2090..209C is an incomplete
> character range).
>
>
> Thank you for your time and best regards,
> Dennis
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>

Received on 2022-08-11 16:07:47