On Sun, 8 Sep 2019 at 19:30, Tom Honermann <tom@honermann.net> wrote:On 9/8/19 12:40 PM, Corentin wrote:
It isn't relevant in determining how we resolve this issue. If the resolution is that field widths are measured in EGCs, then we've already decided that the width is locale dependent and tailoring becomes an implementation detail.
On Sun, 8 Sep 2019 at 18:12, Tom Honermann <tom@honermann.net> wrote:
On 9/8/19 6:00 AM, Corentin via Lib wrote:
On Sun, 8 Sep 2019 at 11:17, Corentin <corentin.jabot@gmail.com> wrote:
On Sun, 8 Sep 2019 at 09:52, Billy O'Neal (VC LIBS) <bion@microsoft.com> wrote:
> I agree that EGCS is the best option. That doesn't drag locale
Because we don’t get to assume that we’re talking about Unicode at all, it absolutely drags in locale.
Sorry, I should have been more specific.There is a non-tailored Unicode EGCS boundary algorithm (but it can be tailored)I didn't mean to imply that text manipulation can be done without knowing its encoding and never use "locale" to mean encoding.
EGCS are only defined for text whose character repertoire is Unicode, other encodings deal with codepoints
To be clear, the difference of whether the EGC algorithm is required to be tailored or not is that tailoring for all intent and purposes requiresicu or something with CLDR, which restrict the platforms on which this can be implemented
Tailoring is not relevant to this discussion.
It is - see https://unicode.org/reports/tr29/ "ch" is 2 EGCS in most locales but in Slovak it's 1. I don't make the rules :DNo, format decided to be locale-independent (for good reason) and applying locale specific behavior implicitly would be against that.I'n arguing for encoding specific behaviorYou seem to be missing the point that, for char and wchar_t, the encoding can’t be known (in general) without consulting the locale. Again, LANG=C vs LANG=C.UTF-8.Tom.