C++ Logo


Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 11 Sep 2019 18:08:21 -0700
On Wednesday, 11 September 2019 17:53:06 PDT Victor Zverovich wrote:
> > Is it really too much to ask that it be decoded according to the
> > locale-specified encoding?
> Locale-independence by default is a good property to have. In particular, it
> guarantees that std::formatted_size will return the size that is sufficient
> for a buffer passed in a subsequent call to std::format_to regardless of
> any locale shenanigan that happen in between (possibly in another thread).

Right, that's the other big use for specifying field sizes: limiting the
storage. That's definitely a count of code units, whichever they are of the
content being used.

That means a mismatch of the character type when one of them is char or
wchar_t implies locale use already:

  std::format("{}{}", L"foo", u"bar");

There need to be calls to wcstombs and c16stombs behind the scenes.

> However, one might argue that by explicitly specifying the width a user
> opted into encoding-aware behavior for strings (whether determined by
> locale or not), so maybe it's not as big of a problem as I initially
> thought. There is still a question of how to express the concept of
> perceived width in standardese wording. Thiago, do you have any
> suggestions?

No, sorry.

But I think that we clearly have two distinct uses: a code unit count for
storage purposes and a cell grid (monospace font) count for alignment
purposes. Note how maxima and minima are inverted: usually, if you're trying
to align you need to specify a minimum, but if you're trying to ensure
something fits a storage, you specify a maximum.

We may want to reflect that by using different terms to be clear: maybe size
and width. Or length and width. Or another term: iostreams and printf use
"field width" to mean the storage size count.

PS: if we go for length and width, we'll only be missing the mythical string

PPS: Joking aside, I can see a use-case for that for line-wrapped text, but
that requires solving the text boundary problem first and that definitely
requires the Unicode database.

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-09-12 03:08:26