sg16: Re: [SG16-Unicode] Ideas for the future

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 30 Jul 2019 11:12:53 -0400

On 7/30/19 10:14 AM, keld_at_[hidden] wrote:
> hi all
>
> I would like in the future that C++ programs was as portable
> as possible and also as adaptable to cultures as possible, so that when you write
> a program it was easy to provide it to as many users a possible.
I strongly agree.
>
> The first thing - portablity - is what we have been aiming at for many years
> and involves a basic character set as we do it now. That is basically ASCII.
> No funny characters like · and ± and ÷.
I tend to agree. I don't recall specific proposals for core language
features that would allow defining new operators, but I like that
approach as opposed to extending the basic source character set and
current fixed set of operators. Keld, what would you think of that
approach?
>
> the second - cultural adaptability - is something about having all input and
> output in a fashion that users feel natural. We go a long way
> with the locale stuff we have, but I would like the language to support string to
> be marked as translatable, and an ecosystem to support it. Most serious programs
> today are written for translation. So some syntax for strings
> like g"translatable text" could be good. And then maybe some notion for voice too
> - and other possible outputs - eg for disabled people.

I agree that having a (defacto) standard way of specifying translatable
strings would be very helpful. Is anyone aware of prior proposals or
widely adopted alternatives to POSIX/GNU gettext? I have some
experience with Microsoft's resource DLLs for providing string
translations (and have implemented a similar system on non-Windows
platforms), but lack other experience. Anyone interested in working on this?

Tom.

>
> Keld
>
> On Mon, Jul 29, 2019 at 11:02:41PM +0000, Lev Minkovsky wrote:
>> All,
>>
>> Tom Honermann encouraged me to share with you several ideas that at some point in the future may become proposable.
>>
>> First is the ??? character (Alt-24 with NumLock on). We had a discussion a while back with Bjarne and a few other C++ luminaries in regards to a possible exponentiation operator. None of the more conventional alternatives appeared to be a good candidate, while ??? is a symbol used for that purpose by Donald Knuth, see https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation, and would be excellent for readability. Perhaps we can add it at some point to the basic character set. I am not at all worried about its absence on the keyboard, math folks will quickly get used to Alt-24.
>>
>> I would imagine the right approach for this to happen is to ask ourselves: what is are the specific characters that we wish were in the basic character set? My initial list would be: $,@,???,??? or ·,÷ . $ is already in Microsoft basic character set, see https://docs.microsoft.com/en-us/cpp/cpp/character-sets?view=vs-2019, so perhaps this would be a low-hanging fruit. The middle dot symbol and the obelus could be used as an alternative multiplication and division operators. Swift already has user-defined operators; if we ever get them, it would be awesome to have something like
>>
>> long long operator ·(long m, long n) { return (long long)m * (long long)n; }
>>
>> The second, far more impactful idea would be to unicodize the entire language and let the users use keywords in their national languages. Programmers outside USA (surprise, surprise) often think in their native languages and often prefer to write comments in them. For example, I know that the SAP codebase is full of comments in German. A source file is a specialized text, and every language switch is a disorienting experience, especially if these languages are not related. Algol 68 designers already understood this and translated the language into Russian, German, French, Bulgarian, Chinese and Japanese, including of course the keywords. This could facilitate teaching/studying the language as well.
>> As an illustration, let us consider 3 variants of Hello-World, first the canonic version with comments, second with the same comments in Russian and third a hypothetical Hello world/???????????? ?????? in C++ with Russian keywords:
>>
>>
>> //This is needed for printf
>> #include <stdio.h>
>>
>> //Program entry
>> int main()
>> {
>> //Let's greet the world
>> printf("Hello world!\n");
>> }
>>
>>
>> //?????? ?????????????????? ?????? printf
>> #include <stdio.h>
>>
>> //???????? ?? ??????????????????
>> int main()
>> {
>> //???????????????????????? ??????
>> printf("???????????? ??????!\n");
>> }
>>
>>
>> //?????? ?????????????????? ?????? ????????????
>> #???????????????? <??????????.??>
>>
>> //???????? ?? ??????????????????
>> ?????? ??????????????()
>> {
>> //???????????????????????? ??????
>> ????????????("???????????? ??????!\n");
>> }
>>
>>
>> I would imagine that for most if not all of you, the third example looks like gibberish. I can assure you that, for young future programmers from the countries where English isn???t widely spoken, the first Hello World looks just as gibberishly. Some of them may even be reluctant to enter a career where they would have to deal with pages and pages of such stuff on a daily basis.
>>
>> Finally, I wanted to show you a couple of additional ???hello-world???s. The first is valid C++ that stress-tests the system it runs on by using English, Russian, Georgian and Chinese words in the same sentence:
>>
>>
>> #include <stdio.h>
>>
>> main()
>> {
>> printf(u8"Hello-????????????-???????????????????????????-??????, world!\n");
>> }
>>
>> The second is something I put together as a 21 century version of Hello world. Alas, only a very small fraction of it is now well-formed.
>>
>>
>> /*
>>
>> The first program to write is the same for all languages:
>>
>> Print the words
>>
>> hello, world
>>
>> #include <stdio.h>
>>
>> int main()
>> {
>> printf("hello, world\n");
>> }
>>
>> */
>>
>> import std.ui; //future UI module
>> import std.core;
>>
>> int main()
>> {
>> static std::map<std::language_id_t, std::u8string> hellos{ //language_id_t comes from std.ui
>> { "English"lid, "Hello, world" }, //a literal produces the right language type
>> { "Chinese"lid, "???????????????" },
>> { "Hindi"lid, "?????????????????? ??????????????????" },
>> { "Spanish"lid, "Hola Mundo" },
>> { "French"lid, "Bonjour le monde" },
>> { "Arabic"lid, "?????????? ??????????????" },
>> { "Bengali"lid, "????????? ???????????????" },
>> { "Russian"lid, "????????????, ??????" },
>> { "Portuguese"lid, "Olá Mundo" },
>> { "Indonesian"lid, "Halo Dunia" },
>> { "Urdu"lid, "?????????? ????????" },
>> { "German"lid, "Hallo Welt" },
>> { "Japanese"lid, "?????????????????????" },
>> { "Swahili"lid, "Salamu, Dunia" },
>> { "Punjabi"lid, "????????? ???????????? ???????????? ???????????????" },
>> { "Telugu"lid, "?????????, ?????????????????????" },
>> { "Javanese"lid, "Hello, donya" },
>> { "Marathi"lid, "????????????, ??????" },
>> { "Turkish"lid, "Selam Dünya" },
>> };
>>
>> //More than 75 % of the world population would be able to read and understand its greeting.
>>
>> std::post_notification ( //this also comes from std.ui
>> // if we can define variables in an if statement, why can't we in a tertiary operator?
>> (std::optional<std::u8string> hello = hellos[std::get_language_id()]) //get the default system language
>> ? *hello
>> : *hellos["English"lid];
>> );
>> }
>>
>>
>> Thank you ???
>>
>> Lev Minkovsky
>>
>>
>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-07-30 17:12:56