C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code Points" blog post

From: Titus Winters <titus_at_[hidden]>
Date: Wed, 13 Nov 2019 05:56:34 +0000
SD-8 is *appropriate* if we want to tell the public "The committee probably
won't consider anything like X a breaking change, if your code gets in the
way of that you may have a difficult time upgrading."

It's never *necessary*, nor does it *limit* us - we might still decide to
do things that are outside of that scope. It's just trying to set general
expectations.

(This doesn't sound like a case that falls into that category.)

On Tue, Nov 12, 2019 at 10:15 PM Billy O'Neal (VC LIBS) <bion_at_[hidden]>
wrote:

> Sorry, I added Titus to ask if we need to talk about this in SD-8 somehow.
>
>
>
> Billy3
>
>
>
> *From: *Billy O'Neal (VC LIBS) via Lib-Ext <lib-ext_at_[hidden]>
> *Sent: *Tuesday, November 12, 2019 1:14 PM
> *To: *Tom Honermann <tom_at_[hidden]>; lib-ext_at_[hidden];
> Corentin <corentin.jabot_at_[hidden]>; Titus Winters <titus_at_[hidden]>
> *Cc: *Billy O'Neal (VC LIBS) <bion_at_[hidden]>; Victor Zverovich
> <victor.zverovich_at_[hidden]>; lib_at_[hidden]; SG16
> <unicode_at_[hidden]>
> *Subject: *Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code
> Points" blog post
>
>
>
> I haven’t seen how customers will use this API enough to go so far as make
> the statement “implementers aren’t going to be willing to change […]” at
> this time. It is certainly a possibility. Changes to that table are
> breaking changes. Whether we’re going to be willing to make such changes is
> a value judgement on potential breaks vs. such benefit that might be
> attained from those breaks.
>
>
>
> > I take it your concern is regarding code that calls std::format_to with
> an assumption that the provided output buffer is large enough?
>
>
>
> More or less, yes. Certainly we see people do that with sprintf today.
>
>
>
> Billy3
>
>
>
> *From: *Tom Honermann <tom_at_[hidden]>
> *Sent: *Tuesday, November 12, 2019 1:09 PM
> *To: *Billy O'Neal (VC LIBS) <bion_at_[hidden]>;
> lib-ext_at_[hidden]; Corentin <corentin.jabot_at_[hidden]>
> *Cc: *lib_at_[hidden]; SG16 <unicode_at_[hidden]>; Victor Zverovich
> <victor.zverovich_at_[hidden]>
> *Subject: *Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code
> Points" blog post
>
>
>
> If implementors aren't going to be willing to change these tables once we
> ship, then I think we have a fairly serious issue.
>
>
>
> Some have adamantly stated that these widths are estimates only and should
> not be counted on to remain stable. Code that is sensitive to the
> formatted size of the output should be calling std::formatted_size and
> allocating appropriately. I take it your concern is regarding code that
> calls std::format_to with an assumption that the provided output buffer is
> large enough? (or, code that calls std::format and assumes the size of the
> resulting std::string).
>
>
>
> Tom.
>
>
>
> On 11/12/19 8:58 PM, Billy O'Neal (VC LIBS) wrote:
>
> My only point was that the specified behavior gives grapheme clusters a
> width of 1 or 2, but there exist characters like U+FDFD that are wider than
> 2. (And many that have a width of 0) I would be very nervous about changing
> the constants used after std::format ships because that could introduce
> unexpected buffer overruns or underruns in user programs. This is the kind
> of thing that becomes contractual very quickly (which is one of the reasons
> I was weakly against trying to open this can of worms).
>
>
>
> Billy3
>
>
>
> *From: *Tom Honermann <tom_at_[hidden]>
> *Sent: *Tuesday, November 12, 2019 12:53 PM
> *To: *lib-ext_at_[hidden]; Corentin <corentin.jabot_at_[hidden]>
> *Cc: *Billy O'Neal (VC LIBS) <bion_at_[hidden]>; lib_at_[hidden];
> SG16 <unicode_at_[hidden]>; Victor Zverovich <victor.zverovich_at_[hidden]>
> *Subject: *Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code
> Points" blog post
>
>
>
> On 11/12/19 6:11 PM, Billy O'Neal (VC LIBS) via Lib-Ext wrote:
>
> It came up in the context of that width thing in format and I was asking
> if I had permission to make wider-than-2 characters format properly, and
> the forwarded text doesn’t seem to allow that (which is OK, I just wanted
> to understand at the time); I was thinking of U+FDFD (﷽).
>
> Can you elaborate? My understanding of the forwarded wording is that the
> assumed encoding for the input text is implementation defined (though not
> locale sensitive) and that implementors are encouraged to use the Unicode
> code point ranges indicated in the wording, but are not required to (that
> is my interpretation of the use of the word "should" in the proposed
> wording).
>
> It does look like the provided code point ranges don't handle U+FDFD
> correctly.
>
> I don't know how much confidence should be placed on the listed code point
> ranges. But I think it is important that we consider them amenable to
> change. I suspect that U+FDFD is not the last code point we'll find that
> is not correctly handled.
>
> Tom.
>
>
>
> Billy3
>
>
>
> *From: *Corentin <corentin.jabot_at_[hidden]>
> *Sent: *Tuesday, November 12, 2019 8:42 AM
> *To: *C++ Library Evolution Working Group <lib-ext_at_[hidden]>
> *Cc: *lib_at_[hidden]; Billy O'Neal (VC LIBS) <bion_at_[hidden]>;
> SG16 <unicode_at_[hidden]>
> *Subject: *Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code
> Points" blog post
>
>
>
>
>
>
>
> On Tue, 12 Nov 2019 at 16:58, Billy O'Neal (VC LIBS) via Lib-Ext <
> lib-ext_at_[hidden]> wrote:
>
> During review of some Unicode stuff in LWG we had a mini discussion for
> some folks about grapheme clusters and I mentioned everyone who touches
> this stuff might understand the complexities better if they read this:
>
>
>
>
> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmanishearth.github.io%2Fblog%2F2017%2F01%2F14%2Fstop-ascribing-meaning-to-unicode-code-points%2F&data=02%7C01%7Cbion%40microsoft.com%7C325ed688adf24821865508d767b55bf1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091900938888858&sdata=n6PWmt9higWO%2BDgRCopDQLf8huNNtXtLaPEOSnX4Lds%3D&reserved=0>
>
>
>
> +1
>
> FYI SG-16 is aware of that blog post and i think there is a pretty strong
> agreement with it.
>
> Codepoints have some use (notably the Unicode Character Database is really
> the Unicode Codepoint Database, and most Unicode algorithms works on
> codepoints), but any kind of user facing UX should deal with EGCS.
>
> It is not always what applications choose to do for a variety of reasons.
> Notably Twitter character counts deals in codepoints, web browsers
> search function use codepoints as to ignore diacritics, and comparisons can
> be done on (normalized) codepoint sequences.
>
>
>
> There is also not always a 1-1 mapping between what people understand as
> "character", grapheme clusters and glyphes.
>
>
>
>
>
> Billy3
>
> _______________________________________________
> Lib-Ext mailing list
> Lib-Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C325ed688adf24821865508d767b55bf1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091900938898848&sdata=Inj6zKImFUHAzMuOG9XGDnFNaV0sk4oqowibQ0AIF4E%3D&reserved=0>
> Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13606.php
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13606.php&data=02%7C01%7Cbion%40microsoft.com%7C325ed688adf24821865508d767b55bf1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091900938898848&sdata=65O8kixjxGs7UKCX8%2Fb1yHuVj41a3hr0VcSHiTsTdpw%3D&reserved=0>
>
>
>
>
>
> _______________________________________________
>
> Lib-Ext mailing list
>
> Lib-Ext_at_[hidden]
>
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C325ed688adf24821865508d767b55bf1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091900938908847&sdata=Qbrmymcetx9msnXGCnfQGmT39hiiscI2Sjha97S80c8%3D&reserved=0>
>
> Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13609.php <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13609.php&data=02%7C01%7Cbion%40microsoft.com%7C325ed688adf24821865508d767b55bf1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091900938908847&sdata=bfw5Bj%2Fa5Fy5DFjo%2BAwWX4mNJRl0%2B8GWdDL5r0HwKm0%3D&reserved=0>
>
>
>
>
>
>
>
>
>
>
>

Received on 2019-11-13 06:56:49