Date: Thu, 15 Apr 2021 21:14:36 +0200
Tom,
on Thu, 15 Apr 2021 09:47:05 -0700 you ("Tom Scogland"
<scogland1_at_[hidden]>) wrote:
> Trying to stick to strictly technical issues, or at least challenges
> to that implementation, there are some source codebases that do (or
> at least have) used characters in these sets in ways that likely
> would no longer function with translations such as the one you
> propose for code point x2237. The first example that comes to mind,
> that’s public and easy to reference, is in the bootstrap path for go
> from c through it’s original toolchain where the middle dot character
> was used throughout the code to allow c to look like it has module
> namespacing:
> https://github.com/golang/go/blob/402d3590b54e4a0df9fb51ed14b2999e85ce0b76/src/pkg/runtime/chan.c#L155
>
> If the middle dot becomes a period, or anything other than a valid
> identifier character, that code will break. This is not a common
> practice, but I’ve also seen the Pa (ᐸ) and Po (ᐳ) symbols made to
> make generated function names “look like generics.”
Yes, for the characters that are non-mathematical and "pure" language
syntax there would certainly a debade to have which ones would be
appropriate choices. For attributes and templates all codepoints that
are specified as punctuation and opening and closing pairs, could
perhaps qualify.
The choices I use for these are personal and I am well aware of
this. My criterion was merely to have a glyph that comes close to the
current usage, but that is probably also debadable.
> That’s not to say necessarily that something shouldn’t be done here,
> but sadly existing code does exist that could be broken by decisions
> in this space. If it’s something the committees decide we want to
> do, learning from previous (somewhat successful, somewhat painful)
> experiences from Fortress and more successfully and recently from
> Julia which allows unicode characters almost arbitrarily, but which
> does assign meanings to a good number of symbols through it’s parser
> here:
> https://github.com/JuliaLang/julia/blob/4996445df37e526dac2772e333caf82f1ea987f0/src/julia-parser.scm#L6
>
> I was surprised to find it doesn’t include anything for Pa, Po or
> middle dot actually. It does however define the
> Proportion character “∷” as a comparison operator, possibly
> because it’s from the mathematical block or possibly because the
> classic use along with Ratio “∶” would suggest its use in
> expressions like a∶b ∷ c∶d to express, or perhaps test,
> proportional ratios rather than as a separator or otherwise
> equivalent to two colons.
Right, we have to look into existing usage, extensions to C or C++ if
where they exist, and also what other programming languages have
established.
> Honestly I think it’s things like that which make this a harder
> problem more than the technical challenge of implementing it.
> Deciding what all of the characters should mean is not a trivial
> task, and frequently results in differing opinions.
My hope would be that for the codepoints where Unicode assigns a
semantic that corresponds to our operators, the use (if we chose to do
so) should be relatively clear. In this category I see
≤ ≥ ≡ ≠ ¬ ∧ ∨ x ∪ ∩ ⊖
where one might not like × or ∪ because their glyphs are too close to
alphabetic characters, and be confused by ⊖ because it is not so
commonly used for symmetric difference, aka bitwise xor.
For the non-mathematical operator `->` I generally use `→`.
For the grouping characters as said I currently opted for
⟦ ∷ ⟧ for attributes
and if I'd be programming a lot of C++ I'd probably use single
guillements for templates
‹ ›
But all of that is certainly questionable and mostly reflects just my
personal preferences.
Thanks
Jens
on Thu, 15 Apr 2021 09:47:05 -0700 you ("Tom Scogland"
<scogland1_at_[hidden]>) wrote:
> Trying to stick to strictly technical issues, or at least challenges
> to that implementation, there are some source codebases that do (or
> at least have) used characters in these sets in ways that likely
> would no longer function with translations such as the one you
> propose for code point x2237. The first example that comes to mind,
> that’s public and easy to reference, is in the bootstrap path for go
> from c through it’s original toolchain where the middle dot character
> was used throughout the code to allow c to look like it has module
> namespacing:
> https://github.com/golang/go/blob/402d3590b54e4a0df9fb51ed14b2999e85ce0b76/src/pkg/runtime/chan.c#L155
>
> If the middle dot becomes a period, or anything other than a valid
> identifier character, that code will break. This is not a common
> practice, but I’ve also seen the Pa (ᐸ) and Po (ᐳ) symbols made to
> make generated function names “look like generics.”
Yes, for the characters that are non-mathematical and "pure" language
syntax there would certainly a debade to have which ones would be
appropriate choices. For attributes and templates all codepoints that
are specified as punctuation and opening and closing pairs, could
perhaps qualify.
The choices I use for these are personal and I am well aware of
this. My criterion was merely to have a glyph that comes close to the
current usage, but that is probably also debadable.
> That’s not to say necessarily that something shouldn’t be done here,
> but sadly existing code does exist that could be broken by decisions
> in this space. If it’s something the committees decide we want to
> do, learning from previous (somewhat successful, somewhat painful)
> experiences from Fortress and more successfully and recently from
> Julia which allows unicode characters almost arbitrarily, but which
> does assign meanings to a good number of symbols through it’s parser
> here:
> https://github.com/JuliaLang/julia/blob/4996445df37e526dac2772e333caf82f1ea987f0/src/julia-parser.scm#L6
>
> I was surprised to find it doesn’t include anything for Pa, Po or
> middle dot actually. It does however define the
> Proportion character “∷” as a comparison operator, possibly
> because it’s from the mathematical block or possibly because the
> classic use along with Ratio “∶” would suggest its use in
> expressions like a∶b ∷ c∶d to express, or perhaps test,
> proportional ratios rather than as a separator or otherwise
> equivalent to two colons.
Right, we have to look into existing usage, extensions to C or C++ if
where they exist, and also what other programming languages have
established.
> Honestly I think it’s things like that which make this a harder
> problem more than the technical challenge of implementing it.
> Deciding what all of the characters should mean is not a trivial
> task, and frequently results in differing opinions.
My hope would be that for the codepoints where Unicode assigns a
semantic that corresponds to our operators, the use (if we chose to do
so) should be relatively clear. In this category I see
≤ ≥ ≡ ≠ ¬ ∧ ∨ x ∪ ∩ ⊖
where one might not like × or ∪ because their glyphs are too close to
alphabetic characters, and be confused by ⊖ because it is not so
commonly used for symmetric difference, aka bitwise xor.
For the non-mathematical operator `->` I generally use `→`.
For the grouping characters as said I currently opted for
⟦ ∷ ⟧ for attributes
and if I'd be programming a lot of C++ I'd probably use single
guillements for templates
‹ ›
But all of that is certainly questionable and mostly reflects just my
personal preferences.
Thanks
Jens
-- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::
Received on 2021-04-15 14:15:09