sg16: Re: [SG16-Unicode] Ideas for the future

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 30 Jul 2019 11:06:34 -0400

On 7/29/19 7:02 PM, Lev Minkovsky wrote:
>
> First is the ↑ character (Alt-24 with NumLock on). We had a discussion
> a while back with Bjarne and a few other C++ luminaries in regards to
> a possible exponentiation operator. None of the more conventional
> alternatives appeared to be a good candidate, while ↑ is a symbol used
> for that purpose by Donald Knuth, see
> https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation, and would
> be excellent for readability. Perhaps we can add it at some point to
> the basic character set. I am not at all worried about its absence on
> the keyboard, math folks will quickly get used to Alt-24.
>
> I would imagine the right approach for this to happen is to ask
> ourselves: what is are the specific characters that we wish were in
> the basic character set? My initial list would be: $,@,↑,• or ·,÷ . $
> is already in Microsoft basic character set, see
> https://docs.microsoft.com/en-us/cpp/cpp/character-sets?view=vs-2019,
> so perhaps this would be a low-hanging fruit. The middle dot symbol
> and the obelus could be used as an alternative multiplication and
> division operators. Swift already has user-defined operators; if we
> ever get them, it would be awesome to have something like
>
> longlongoperator ·(longm, longn) { return(longlong)m * (longlong)n; }
>
Extending the basic source character set is something that is on my
mind. My first priority would be to get '@' added so that email address
can be used in portable programs :). Next up would be '$'. These two
are ubiquitous, would not be problematic in practice (they are present
in ASCII and common EBCDIC code pages), and we could introduce digraphs
for them.

Characters like ↑ are more problematic since they effectively require a
Unicode encoding (we could, of course, specify digraphs for it as
well). I think the battle for the first non-ASCII character in the
basic source character set will be hard (I'm not sure I want to take it
on). I can only imagine the number of papers that will come afterward
proposing new operators for all kinds of interesting purposes! We may
want to consider a core language facility for defining new operators;
that would enable adding operators without extending the basic source
character set.

> The second, far more impactful idea would be to unicodize the entire
> language and let the users use keywords in their national languages.
> Programmers outside USA (surprise, surprise) often think in their
> native languages and often prefer to write comments in them. For
> example, I know that the SAP codebase is full of comments in German. A
> source file is a specialized text, and every language switch is a
> disorienting experience, especially if these languages are not
> related. Algol 68 designers already understood this and translated the
> language into Russian, German, French, Bulgarian, Chinese and
> Japanese, including of course the keywords. This could facilitate
> teaching/studying the language as well.
>
> As an illustration, let us consider 3 variants of Hello-World, first
> the canonic version with comments, second with the same comments in
> Russian and third a hypothetical Hello world/Приветмир in C++ with
> Russian keywords:
>
>
>
> //This is needed for printf
>
> #include<stdio.h>
>
> //Program entry
>
> intmain()
>
> {
>
> //Let's greet the world
>
> printf("Hello world!\n");
>
> }
>
> //Это требуется для printf
>
> #include<stdio.h>
>
> //Вход в программу
>
> intmain()
>
> {
>
> //Приветствуем мир
>
> printf("Привет мир!\n");
>
> }
>
> //Это требуется для печати
>
> #включить <стдвв.г>
>
> //Вход в программу
>
> цел главная()
>
> {
>
> //Приветствуем мир
>
> печать("Привет мир!\n");
>
> }
>
> I would imagine that for most if not all of you, the third example
> looks like gibberish. I can assure you that, for young future
> programmers from the countries where English isn’t widely spoken, the
> first Hello World looks just as gibberishly. Some of them may even be
> reluctant to enter a career where they would have to deal with pages
> and pages of such stuff on a daily basis.
>
I've had this though as well. The obvious down side is that it could
make sharing code more difficult. But, translation would be relatively
easy as well, so perhaps not a problem in practice. We don't have a lot
of keywords, so I'm not sure how impactful this is; and I lack the
non-native language experience to draw on. Thanks for the Algol 68
reference; I wasn't aware of this prior experience!
>
> Finally, I wanted to show you a couple of additional “hello-world”s.
> The first is valid C++ that stress-tests the system it runs on by
> using English, Russian, Georgian and Chinese words in the same sentence:
>
> #include<stdio.h>
>
> main()
>
> {
>
> printf(u8"Hello-привет-გამარჯობა-你好, world!\n");
>
> }
>
Note that passing a u8 string literal to printf is mojibake unless the
execution encoding happens to be UTF-8. And, of course, this won't
compile in C++20.

> The second is something I put together as a 21 century version of
> Hello world. Alas, only a very small fraction of it is now well-formed.
>
> /*
>
> The first program to write is the same for all languages:
>
> Print the words
>
> hello, world
>
> #include <stdio.h>
>
> int main()
>
> {
>
> printf("hello, world\n");
>
> }
>
> */
>
> importstd.ui; //future UI module
>
> importstd.core;
>
> intmain()
>
> {
>
> staticstd::map<std::language_id_t, std::u8string> hellos{
> //language_id_t comes from std.ui
>
> { "English"lid, "Hello, world"}, //a literal produces the right
> language type
>
> { "Chinese"lid, "你好，世界"},
>
> { "Hindi"lid, "नमस्तेदुनिया"},
>
> { "Spanish"lid, "Hola Mundo"},
>
> { "French"lid, "Bonjour le monde"},
>
> { "Arabic"lid, "مرحبابالعالم"},
>
> { "Bengali"lid, "ওহেবিশ্ব"},
>
> { "Russian"lid, "Привет, мир"},
>
> { "Portuguese"lid, "Olá Mundo"},
>
> { "Indonesian"lid, "Halo Dunia"},
>
> { "Urdu"lid, "ہیلو،دنیا"},
>
> { "German"lid, "Hallo Welt"},
>
> { "Japanese"lid, "こんにちは世界"},
>
> { "Swahili"lid, "Salamu, Dunia"},
>
> { "Punjabi"lid, "ਸਤਿਸ੍ਰੀਅਕਾਲਦੁਨਿਆ"},
>
> { "Telugu"lid, "హలో, ప్రపంచం"},
>
> { "Javanese"lid, "Hello, donya"},
>
> { "Marathi"lid, "हॅलो, जग"},
>
> { "Turkish"lid, "Selam Dünya"},
>
> };
>
> //More than 75 % of the world population would be able to read and
> understand its greeting.
>
> std::post_notification ( //this also comes from std.ui
>
> // if we can define variables in an if statement, why can't we in a
> tertiary operator?
>
> (std::optional<std::u8string> hello = hellos[std::get_language_id()])
> //get the default system language
>
> ? *hello
>
> : *hellos["English"lid];
>
> );
>
> }
>
For me, the most interesting part of this is the post_notification
interface presumably targeting a post-terminal world :)

Tom.

> Thank you –
>
> Lev Minkovsky
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-07-30 17:06:39