On 7/29/19 7:02 PM, Lev Minkovsky wrote:

First is the ↑ character (Alt-24 with NumLock on). We had a discussion a while back with Bjarne and a few other C++ luminaries in regards to a possible exponentiation operator. None of the more conventional alternatives appeared to be a good candidate, while ↑ is a symbol used for that purpose by Donald Knuth, see https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation, and would be excellent for readability. Perhaps we can add it at some point to the basic character set. I am not at all worried about its absence on the keyboard, math folks will quickly get used to Alt-24.

I would imagine the right approach for this to happen is to ask ourselves: what is are the specific characters that we wish were in the basic character set? My initial list would be: $,@,↑,• or ·,÷ . $ is already in Microsoft basic character set, see https://docs.microsoft.com/en-us/cpp/cpp/character-sets?view=vs-2019, so perhaps this would be a low-hanging fruit. The middle dot symbol and the obelus could be used as an alternative multiplication and division operators. Swift already has user-defined operators; if we ever get them, it would be awesome to have something like

long long operator ·(long m, long n) { return (long long)m * (long long)n; }

Extending the basic source character set is something that is on my mind. My first priority would be to get '@' added so that email address can be used in portable programs :). Next up would be '$'. These two are ubiquitous, would not be problematic in practice (they are present in ASCII and common EBCDIC code pages), and we could introduce digraphs for them.

Characters like ↑ are more problematic since they effectively require a Unicode encoding (we could, of course, specify digraphs for it as well). I think the battle for the first non-ASCII character in the basic source character set will be hard (I'm not sure I want to take it on). I can only imagine the number of papers that will come afterward proposing new operators for all kinds of interesting purposes! We may want to consider a core language facility for defining new operators; that would enable adding operators without extending the basic source character set.

The second, far more impactful idea would be to unicodize the entire language and let the users use keywords in their national languages. Programmers outside USA (surprise, surprise) often think in their native languages and often prefer to write comments in them. For example, I know that the SAP codebase is full of comments in German. A source file is a specialized text, and every language switch is a disorienting experience, especially if these languages are not related. Algol 68 designers already understood this and translated the language into Russian, German, French, Bulgarian, Chinese and Japanese, including of course the keywords. This could facilitate teaching/studying the language as well.

As an illustration, let us consider 3 variants of Hello-World, first the canonic version with comments, second with the same comments in Russian and third a hypothetical Hello world/Привет мир in C++ with Russian keywords:

//This is needed for printf

#include <stdio.h>

//Program entry

int main()

{

   //Let's greet the world

   printf("Hello world!\n");

}

//Это требуется для printf

#include <stdio.h>

//Вход в программу

int main()

{

   //Приветствуем мир

   printf("Привет мир!\n");

}

//Это требуется для печати

#включить <стдвв.г>

//Вход в программу

цел главная()

{

   //Приветствуем мир

   печать("Привет мир!\n");

}

I would imagine that for most if not all of you, the third example looks like gibberish. I can assure you that, for young future programmers from the countries where English isn’t widely spoken, the first Hello World looks just as gibberishly. Some of them may even be reluctant to enter a career where they would have to deal with pages and pages of such stuff on a daily basis.

I've had this though as well. The obvious down side is that it could make sharing code more difficult. But, translation would be relatively easy as well, so perhaps not a problem in practice. We don't have a lot of keywords, so I'm not sure how impactful this is; and I lack the non-native language experience to draw on. Thanks for the Algol 68 reference; I wasn't aware of this prior experience!

Finally, I wanted to show you a couple of additional “hello-world”s. The first is valid C++ that stress-tests the system it runs on by using English, Russian, Georgian and Chinese words in the same sentence:

#include <stdio.h>

main()

{

printf(u8"Hello-привет-გამარჯობა-你好, world!\n");

}

Note that passing a u8 string literal to printf is mojibake unless the execution encoding happens to be UTF-8. And, of course, this won't compile in C++20.

The second is something I put together as a 21 century version of Hello world. Alas, only a very small fraction of it is now well-formed.

/*

The first program to write is the same for all languages:

Print the words

hello, world

#include <stdio.h>

int main()

{

printf("hello, world\n");

}

*/

import std.ui;     //future UI module

import std.core;

int main()

{

   static std::map<std::language_id_t, std::u8string> hellos{ //language_id_t comes from std.ui

       { "English"lid, "Hello, world" }, //a literal produces the right language type

       { "Chinese"lid, "你好，世界" },

       { "Hindi"lid, "नमस्ते दुनिया" },

       { "Spanish"lid, "Hola Mundo" },

       { "French"lid, "Bonjour le monde" },

       { "Arabic"lid, "مرحبا بالعالم" },

       { "Bengali"lid, "ওহে বিশ্ব" },

       { "Russian"lid, "Привет, мир" },

       { "Portuguese"lid, "Olá Mundo" },

       { "Indonesian"lid, "Halo Dunia" },

       { "Urdu"lid, "ہیلو، دنیا" },

       { "German"lid, "Hallo Welt" },

       { "Japanese"lid, "こんにちは世界" },

       { "Swahili"lid, "Salamu, Dunia" },

       { "Punjabi"lid, "ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ ਦੁਨਿਆ" },

       { "Telugu"lid, "హలో, ప్రపంచం" },

       { "Javanese"lid, "Hello, donya" },

       { "Marathi"lid, "हॅलो, जग" },

       { "Turkish"lid, "Selam Dünya" },

   };

   //More than 75 % of the world population would be able to read and understand its greeting.

   std::post_notification ( //this also comes from std.ui

      // if we can define variables in an if statement, why can't we in a tertiary operator?

      (std::optional<std::u8string> hello = hellos[std::get_language_id()]) //get the default system language

      ? *hello

      : *hellos["English"lid];

   );

}

For me, the most interesting part of this is the post_notification interface presumably targeting a post-terminal world :)

Tom.

Thank you –

Lev Minkovsky

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode