Date: Thu, 11 Aug 2022 15:43:44 +0200
On Thu, Aug 11, 2022, 14:59 Ogiermann, Dennis via SG16 <
sg16_at_[hidden]> wrote:
> Dear SG16 members,
>
> I have been referred to this mailing list to share some concerns regarding
> the adoption of P1949. Some preliminary discussion can be found at a Github
> issue that I have opened https://github.com/sg16-unicode/sg16/issues/77 .
> Let me summarize this so far and respond here.
>
>
> Initial Post:
>
> In the scientific computing and numerical community the utilization of
> Unicode identifiers has gained a bit of popularity over the recent years,
> because it helps in keeping the code close to theory. Let us take for
> example this piece of math tex:
>
> $$
> \Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot
> \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot
> \Delta t_{n}
> $$
>
> pre-P1949 we were able to implement this as
>
> ```
> const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) *
> std::pow(εₙ₋₁, β₃/k) * Δtₙ;
> ```
>
> which reads like the formulas, thus reducing mental overhead when writing
> code. I know, this is a simple and short example where it may not be
> directly clear that this is helpful, but once formulas start to get longer
> and the implementation spans in the order of hundreds of lines of code,
> than this can really simplify development, maintenance and debugging a lot.
> https://godbolt.org/z/nG7o5K141 is an example - I also want to note for
> completeness that MSVC seem to have not supported such constructs in first
> place.
>
> I appreciate the time to work on the standard, but I think this proposal
> is the exact opposite of what we developers in the numerical community
> need. [...] I also noticed that super-/subscript letters and some
> super-/subscript symbols seem to be still valid, causing some weird
> inconsistency. [...]
>
> The current direction also raises more questions from my side:
>
> 1. Are there plans to restrict allowed characters further, especially
> the ones used in the computational sciences/basic math notation?
>
No. The current set of character is stable and can be extended over time
but not reduced. This is a very important property of the current design.
The issue is that mathematical symbols used to work by accident as the
allowed ranges were huge and not necessarily well understood, and as such
were affected by the evolution of Unicode.
Whereas now the characters allowed will always be considered valid in
identifiers by Unicode.
2. Is there the possibility to bring back the numerical
> super-/subscripts?
> 3. Related to this, and I know that the unicode consortium does want
> to hear this, but since this is really useful for the numerical community,
> is there any possibility to at least allow the very basic standard letters
> (greek and latin) in super-/subscripts - either directly via unicode or
> some extra mechanism in the language/editors? Yes, I read the opinions on
> this (and I am absolutely not fan of it, less am I agreeing), but viewing
> this from user-perspective, having just some characters avaiblable is
> really weird.
>
I do not think C++ should consider proposal of hastly subsetted ranges of
Unicode because someone can find them useful.
It's what landed us in this situation in the first place. We simply do not
have the expertise to try to do the Unicode consurtium's jobs.
The current state of things also allows us to have portability across C and
C++, python, rust, etc.
This is an important property that we should do our best to keep.
So if you can convince the Unicode consurtium of the benefit of doing that,
it's work that will automatically be applied across the industry.
With the caveat that languages which support custom Unicode operators may
reserve mathematical symbols for these, and so something currently a
mathematical symbol can't probably become an identifier. which is another
thing to consider: by extending what is considered an identifier, and
saying that in particular mathematical symbols are identifiers, we would
loose the design space of custom operators, which cannot be done lightly.
And we certainly would loose portability between C and Swift, which seems
undesirable
In short, as Peter suggested, i would recommend you reach out to the
unicode consurtium, they have more expertise than C++ in regard to
extending the set of identifiers
https://unicode.org/consortium/distlist.html
>
> Jens Maurer and Tom Honermann has answered some of my initial questions in
> the linked Github issue, which I also quickly summarize:
> * Identifier definition has been outsource to the Unicode consortium
> (Unicode Standard Annex 31 defines a recommendation)
> * SG16 is at least aware of the problem mentioned above
> * A member of the Unicode Consortium is currently working on improvements
> to Unicode to support source code as text
> * This mailing list is the correct medium to discuss issues concerning
> SG16 (in contrast to the Github org and associated repos)
>
>
> Now, my understanding of Annex 31 is that it gives only recommendations
> for a formal grammar to describe identifiers, which is fine and I think
> also a necessary step. Also from my understanding P1949R7 is the
> corresponding paper describing how Annex 31 is adopted into the C++
> standard. Trying to throw in something constructive, I think it might make
> sense to extend the alphabet for either XID_Start or XID_Continue with
> 2080..208E and 2090..209C. Also, including
> 00B2,00B3,00B9,2070,2071,2073..207E might make sense, but this may require
> more discussion, whether it such characters might be blocked for other
> purposes. However, my thoughts might be a bit short-sighted from a language
> evolution perspective, because it is merely a hotfix for what is helpful
> for writing math heavy code (e.g. 2090..209C is an incomplete character
> range).
>
>
> Thank you for your time and best regards,
> Dennis
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
sg16_at_[hidden]> wrote:
> Dear SG16 members,
>
> I have been referred to this mailing list to share some concerns regarding
> the adoption of P1949. Some preliminary discussion can be found at a Github
> issue that I have opened https://github.com/sg16-unicode/sg16/issues/77 .
> Let me summarize this so far and respond here.
>
>
> Initial Post:
>
> In the scientific computing and numerical community the utilization of
> Unicode identifiers has gained a bit of popularity over the recent years,
> because it helps in keeping the code close to theory. Let us take for
> example this piece of math tex:
>
> $$
> \Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot
> \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot
> \Delta t_{n}
> $$
>
> pre-P1949 we were able to implement this as
>
> ```
> const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) *
> std::pow(εₙ₋₁, β₃/k) * Δtₙ;
> ```
>
> which reads like the formulas, thus reducing mental overhead when writing
> code. I know, this is a simple and short example where it may not be
> directly clear that this is helpful, but once formulas start to get longer
> and the implementation spans in the order of hundreds of lines of code,
> than this can really simplify development, maintenance and debugging a lot.
> https://godbolt.org/z/nG7o5K141 is an example - I also want to note for
> completeness that MSVC seem to have not supported such constructs in first
> place.
>
> I appreciate the time to work on the standard, but I think this proposal
> is the exact opposite of what we developers in the numerical community
> need. [...] I also noticed that super-/subscript letters and some
> super-/subscript symbols seem to be still valid, causing some weird
> inconsistency. [...]
>
> The current direction also raises more questions from my side:
>
> 1. Are there plans to restrict allowed characters further, especially
> the ones used in the computational sciences/basic math notation?
>
No. The current set of character is stable and can be extended over time
but not reduced. This is a very important property of the current design.
The issue is that mathematical symbols used to work by accident as the
allowed ranges were huge and not necessarily well understood, and as such
were affected by the evolution of Unicode.
Whereas now the characters allowed will always be considered valid in
identifiers by Unicode.
2. Is there the possibility to bring back the numerical
> super-/subscripts?
> 3. Related to this, and I know that the unicode consortium does want
> to hear this, but since this is really useful for the numerical community,
> is there any possibility to at least allow the very basic standard letters
> (greek and latin) in super-/subscripts - either directly via unicode or
> some extra mechanism in the language/editors? Yes, I read the opinions on
> this (and I am absolutely not fan of it, less am I agreeing), but viewing
> this from user-perspective, having just some characters avaiblable is
> really weird.
>
I do not think C++ should consider proposal of hastly subsetted ranges of
Unicode because someone can find them useful.
It's what landed us in this situation in the first place. We simply do not
have the expertise to try to do the Unicode consurtium's jobs.
The current state of things also allows us to have portability across C and
C++, python, rust, etc.
This is an important property that we should do our best to keep.
So if you can convince the Unicode consurtium of the benefit of doing that,
it's work that will automatically be applied across the industry.
With the caveat that languages which support custom Unicode operators may
reserve mathematical symbols for these, and so something currently a
mathematical symbol can't probably become an identifier. which is another
thing to consider: by extending what is considered an identifier, and
saying that in particular mathematical symbols are identifiers, we would
loose the design space of custom operators, which cannot be done lightly.
And we certainly would loose portability between C and Swift, which seems
undesirable
In short, as Peter suggested, i would recommend you reach out to the
unicode consurtium, they have more expertise than C++ in regard to
extending the set of identifiers
https://unicode.org/consortium/distlist.html
>
> Jens Maurer and Tom Honermann has answered some of my initial questions in
> the linked Github issue, which I also quickly summarize:
> * Identifier definition has been outsource to the Unicode consortium
> (Unicode Standard Annex 31 defines a recommendation)
> * SG16 is at least aware of the problem mentioned above
> * A member of the Unicode Consortium is currently working on improvements
> to Unicode to support source code as text
> * This mailing list is the correct medium to discuss issues concerning
> SG16 (in contrast to the Github org and associated repos)
>
>
> Now, my understanding of Annex 31 is that it gives only recommendations
> for a formal grammar to describe identifiers, which is fine and I think
> also a necessary step. Also from my understanding P1949R7 is the
> corresponding paper describing how Annex 31 is adopted into the C++
> standard. Trying to throw in something constructive, I think it might make
> sense to extend the alphabet for either XID_Start or XID_Continue with
> 2080..208E and 2090..209C. Also, including
> 00B2,00B3,00B9,2070,2071,2073..207E might make sense, but this may require
> more discussion, whether it such characters might be blocked for other
> purposes. However, my thoughts might be a bit short-sighted from a language
> evolution perspective, because it is merely a hotfix for what is helpful
> for writing math heavy code (e.g. 2090..209C is an incomplete character
> range).
>
>
> Thank you for your time and best regards,
> Dennis
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2022-08-11 13:43:54