C++ Logo

sg16

Advanced search

Re: [SG16] P1629 and replacement code units vs replacement code points

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 6 Feb 2020 15:55:49 -0500
On Thu, Feb 6, 2020 at 3:45 PM JeanHeyd Meneide <phdofthehouse_at_[hidden]>
wrote:

> ...
>
     Is this a bit more clear?
>
> Sincerely,
> JeanHeyd
>

     Another thing which I forgot to mention is that we need customization.
The (fatal?) flaw of the C API is that it has a fixed number of encodings.
Every time someone needs support for something, they need to write their
own functions to get to and from Unicode, and cannot participate in the
ecosystem without generated a potentially obscene amount of boilerplate.

      This design is intentionally extensible and operates on Encoding
objects (ascii, utf8, utf_ebcdic, my_weird_company_encoding, etc.) because
we need to have this scale. Not in terms of the future, but the past: there
is an exceptionally high amount of not-Unicode in the world. While the Web
has transitioned beautifully, government databases, finance market data,
records and others are by no means the 90%+ UTF8 utopia I would even
remotely pray it was.

     But, we cannot -- and I seriously mean, we CANNOT -- afford to
standardize every (to put it frankly) whack and weird encoding in the
world. That is an undue burden on implementers, and also adds SEVERE lag
time from "time when I need it" to "time I get to actually use the damn
thing". We standardize utf8/16/32, ascii, locale_execution,
wide_locale_execution, literal, and wide_literal, and then let everyone
else slot in whatever random encoding into the system.

     We need to be able to /absorb/ everyone else's encoding into the
system and provide an ultimately frictionless path to Unicode. In this
case, you write the Encoding Object Type once, use it to get into Unicode,
and then never touch the gross details of that encoding ever again. The
standard talks, works with, and handles Unicode Data, and we keep the data
-- marked specifically with a strong object type -- at the fringes of a
program. This makes the core of our programs Unicode, makes the standard
the place for Unicode, but prevents people from having to hack up Yet
Another Solution to the problem.

     Success is determined by our ability to not destroy the other
encodings, but devour them body and soul into the Unicode world without
users ever realizing that's what we're doing.

Sincerely,
JeanHeyd Meneide

Received on 2020-02-06 14:58:39