sg16: Re: [SG16-Unicode] Ideas for the future

From: Lev Minkovsky <lminkovsky_at_[hidden]>
Date: Tue, 30 Jul 2019 15:30:29 +0000

Alas, speakers of RTL languages would probably find it more convenient to use standard English C++.

I never mean the C++ to be translatable into all 200+ world languages that have writing. Each and every such translation will indeed be a major undertaking. The language keywords will actually be an easy part. More difficult will be to translate the library. I suspect the only realistic solution will be to create a set of “national” headers that will map onto the established definitions. For example, the стдвв.г header I used in the example could include the following definition

inline int печать(const char* format, ... )
{
               return printf(format, …);
}

except for the parameter lists would probably have to be handled via parameter packs.

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Sent: Tuesday, July 30, 2019 10:58 AM
To: keld_at_[hidden]
Cc: Lev Minkovsky <lminkovsky_at_[hidden]>; unicode_at_[hidden] <unicode_at_[hidden]>
Subject: Re: [SG16-Unicode] Ideas for the future

     The "hello world" example with lots of languages looks nice to me. That can obviously be a goal of ours, in some form.

TL;DR of below: the premise of what the rest of what you're asking for is nice, but requires an insane amount of complexity in the compiler and for the language than is currently needed.

     We already have Unicode identifiers, we are already working on making text like that just work. My only problem with supporting keywords in every language means that you need to carve those keywords out of every single other language as well, reducing the amount of valid identifiers in the program by quite a bit for every language. The committee already struggles with adding even context-sensitive keywords to the standard: doing so for all languages, and writing standard library function names, concept names, variable names and class names in other languages that make sense and are not just Google Translated specifics will be a herculean effort.
     Your example is also lacking in examples from other languages: some of those languages are Right-To-Left, rather than Left-To-Right. Do the braces invert and show up on the other side of keywords? Do we require that a compiler needs full Bidi processing and localization handling, for each and every program? These are the hard parts of Unicode that aren't just "oh, well the encoding was wrong", and requiring everyone to be mildly familiar with that so they can troubleshoot their programs and fix their compilers is probably not something that flies in the short term. We don't even have a portable "char" right now.
     At the moment, achieving language-specific keywords could be done as a translation layer just before the compiler actually grabs the source. That might be a worthwhile endeavor -- and something actually programmable in standard C++, come C++26 -- that will enable people of different languages to start in their native language when working with C++. And it could be accommodated in a similar fashion in the Standard itself: "translation" already happens of Unicode Characters in your source program to \U-escaped basic character set source blob. The compiler runs (theoretically, compilers are allowed to just skip this if they "understand" the characters anyhow) on this basic character set blob, allowing the processing to be portable. This could be one of the things included in that "{language specific keywords} -> basic source character set keywords" conversion.
     This would be difficult for the standard library, however. We already have severe problems with argument order (memcpy, anyone?): do RTL languages get them in reverse order? The same order? Is the function call on the left or the right of the function name? C++ already has a cramped parsing space. I'm all for the non-English speakers having a vastly easier time, but we don't even have Named Parameters in the language to help make this less of a problem for them, let alone parsers capable of actually being able to handle more than a find-replace of keywords or function names.

Sincerely,
JeanHeyd

Received on 2019-07-30 17:30:34