C++ Logo


Advanced search

[SG16-Unicode] Ideas for the future

From: Lev Minkovsky <lminkovsky_at_[hidden]>
Date: Mon, 29 Jul 2019 23:02:41 +0000

               Tom Honermann encouraged me to share with you several ideas that at some point in the future may become proposable.

First is the ↑ character (Alt-24 with NumLock on). We had a discussion a while back with Bjarne and a few other C++ luminaries in regards to a possible exponentiation operator. None of the more conventional alternatives appeared to be a good candidate, while ↑ is a symbol used for that purpose by Donald Knuth, see https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation, and would be excellent for readability. Perhaps we can add it at some point to the basic character set. I am not at all worried about its absence on the keyboard, math folks will quickly get used to Alt-24.

I would imagine the right approach for this to happen is to ask ourselves: what is are the specific characters that we wish were in the basic character set? My initial list would be: $,@,↑,• or ·,÷ . $ is already in Microsoft basic character set, see https://docs.microsoft.com/en-us/cpp/cpp/character-sets?view=vs-2019, so perhaps this would be a low-hanging fruit. The middle dot symbol and the obelus could be used as an alternative multiplication and division operators. Swift already has user-defined operators; if we ever get them, it would be awesome to have something like

long long operator ·(long m, long n) { return (long long)m * (long long)n; }

The second, far more impactful idea would be to unicodize the entire language and let the users use keywords in their national languages. Programmers outside USA (surprise, surprise) often think in their native languages and often prefer to write comments in them. For example, I know that the SAP codebase is full of comments in German. A source file is a specialized text, and every language switch is a disorienting experience, especially if these languages are not related. Algol 68 designers already understood this and translated the language into Russian, German, French, Bulgarian, Chinese and Japanese, including of course the keywords. This could facilitate teaching/studying the language as well.
As an illustration, let us consider 3 variants of Hello-World, first the canonic version with comments, second with the same comments in Russian and third a hypothetical Hello world/Привет мир in C++ with Russian keywords:

//This is needed for printf
#include <stdio.h>

//Program entry
int main()
   //Let's greet the world
   printf("Hello world!\n");

//Это требуется для printf
#include <stdio.h>

//Вход в программу
int main()
   //Приветствуем мир
   printf("Привет мир!\n");

//Это требуется для печати
#включить <стдвв.г>

//Вход в программу
цел главная()
   //Приветствуем мир
   печать("Привет мир!\n");

I would imagine that for most if not all of you, the third example looks like gibberish. I can assure you that, for young future programmers from the countries where English isn’t widely spoken, the first Hello World looks just as gibberishly. Some of them may even be reluctant to enter a career where they would have to deal with pages and pages of such stuff on a daily basis.

               Finally, I wanted to show you a couple of additional “hello-world”s. The first is valid C++ that stress-tests the system it runs on by using English, Russian, Georgian and Chinese words in the same sentence:

#include <stdio.h>

   printf(u8"Hello-привет-გამარჯობა-你好, world!\n");

The second is something I put together as a 21 century version of Hello world. Alas, only a very small fraction of it is now well-formed.


The first program to write is the same for all languages:

Print the words

hello, world

#include <stdio.h>

int main()
printf("hello, world\n");


import std.ui; //future UI module
import std.core;

int main()
   static std::map<std::language_id_t, std::u8string> hellos{ //language_id_t comes from std.ui
       { "English"lid, "Hello, world" }, //a literal produces the right language type
       { "Chinese"lid, "你好,世界" },
       { "Hindi"lid, "नमस्ते दुनिया" },
       { "Spanish"lid, "Hola Mundo" },
       { "French"lid, "Bonjour le monde" },
       { "Arabic"lid, "مرحبا بالعالم" },
       { "Bengali"lid, "ওহে বিশ্ব" },
       { "Russian"lid, "Привет, мир" },
       { "Portuguese"lid, "Olá Mundo" },
       { "Indonesian"lid, "Halo Dunia" },
       { "Urdu"lid, "ہیلو، دنیا" },
       { "German"lid, "Hallo Welt" },
       { "Japanese"lid, "こんにちは世界" },
       { "Swahili"lid, "Salamu, Dunia" },
       { "Punjabi"lid, "ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ ਦੁਨਿਆ" },
       { "Telugu"lid, "హలో, ప్రపంచం" },
       { "Javanese"lid, "Hello, donya" },
       { "Marathi"lid, "हॅलो, जग" },
       { "Turkish"lid, "Selam Dünya" },

   //More than 75 % of the world population would be able to read and understand its greeting.

   std::post_notification ( //this also comes from std.ui
      // if we can define variables in an if statement, why can't we in a tertiary operator?
      (std::optional<std::u8string> hello = hellos[std::get_language_id()]) //get the default system language
      ? *hello
      : *hellos["English"lid];

Thank you –

            Lev Minkovsky

Received on 2019-07-30 01:02:45