sg16: Re: [SG16-Unicode] Ideas for the future

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Tue, 30 Jul 2019 10:57:53 -0400

     The "hello world" example with lots of languages looks nice to me.
That can obviously be a goal of ours, in some form.

TL;DR of below: the premise of what the rest of what you're asking for is
nice, but requires an insane amount of complexity in the compiler and for
the language than is currently needed.

     We already have Unicode identifiers, we are already working on making
text like that just work. My only problem with supporting keywords in every
language means that you need to carve those keywords out of every single
other language as well, reducing the amount of valid identifiers in the
program by quite a bit for every language. The committee already struggles
with adding even context-sensitive keywords to the standard: doing so for
all languages, and writing standard library function names, concept names,
variable names and class names in other languages that make sense and are
not just Google Translated specifics will be a herculean effort.

     Your example is also lacking in examples from other languages: some of
those languages are Right-To-Left, rather than Left-To-Right. Do the braces
invert and show up on the other side of keywords? Do we require that a
compiler needs full Bidi processing and localization handling, for each and
every program? These are the hard parts of Unicode that aren't just "oh,
well the encoding was wrong", and requiring everyone to be mildly familiar
with that so they can troubleshoot their programs and fix their compilers
is probably not something that flies in the short term. We don't even have
a portable "char" right now.

     At the moment, achieving language-specific keywords could be done as a
translation layer just before the compiler actually grabs the source. That
might be a worthwhile endeavor -- and something actually programmable in
standard C++, come C++26 -- that will enable people of different languages
to start in their native language when working with C++. And it could be
accommodated in a similar fashion in the Standard itself: "translation"
already happens of Unicode Characters in your source program to \U-escaped
basic character set source blob. The compiler runs (theoretically,
compilers are allowed to just skip this if they "understand" the characters
anyhow) on this basic character set blob, allowing the processing to be
portable. This could be one of the things included in that "{language
specific keywords} -> basic source character set keywords" conversion.

     This would be difficult for the standard library, however. We already
have severe problems with argument order (memcpy, anyone?): do RTL
languages get them in reverse order? The same order? Is the function call
on the left or the right of the function name? C++ already has a cramped
parsing space. I'm all for the non-English speakers having a vastly easier
time, but we don't even have Named Parameters in the language to help make
this less of a problem for them, let alone parsers capable of actually
being able to handle more than a find-replace of keywords or function names.

Sincerely,
JeanHeyd

Received on 2019-07-30 16:58:07