On 1/14/20 5:56 PM, Steven R. Loomis via SG16 wrote:

El ene. 13, 2020, a las 5:46 p. m., Corentin Jabot <corentinjabot@gmail.com> escribió:



C++20 (which is on course to be approved next month) will provide a new feature in the name of std::format derived from the popular fmt library (https://fmt.dev/), itself heavily inspired and sharing the syntax of python's format function.

std::format("Hello {}", "World") -> "Hello World";
std::format("{2} + {1} = {0}", 3, 1.0, 2) -> "1.0 + 2 = 3";

Of interest to Unicode and localization:
  • For now this function is mostly byte based, in that it is encoding agnostic.
  • However we made the interesting decision that padding is based on display width (which is fuzzily specified),  as we realized the primary use case for padding was the creation of console interface

display width is complex… Unicode’s East Asian Width is often used for character width, but there’s more to that (and see <https://www.unicode.org/reports/tr11/#Scope> … see for example https://github.com/nodejs/node/blob/b0a762157793b0d9143eaa7c270da91932f2a64f/src/node_i18n.cc#L729 in Node.js — going beyond wcwidth, etc which do not reflect many terminal emulators’ behavior.

Thank you for sharing this, Steven.

The std::format facility standardized for C++20 via P0645 and as expected to be modified as described in P1868 specifies display width in terms of a hard-coded set of Unicode code points.  See the wording section.  The set of code points and associated display widths were taken from Markus Kuhn's wcswidth() implementation.  We know that the current list of code points is incomplete.  For example, no code points are assigned a width of 0, and handling of outliers like U+FDFD {ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM} is absent.

I'm not sufficiently educated to evaluate the relative merits of Markus Kuhn's implementation and the implementation in your Node.js link above.  If you have more information to share on the subject, it would be appreciated.

Wouldn't it be nice if Unicode were to offer an Extended-Grapheme-Cluster-width-in-monospace-font algorithm? :)

Tom.