C++ Logo


Advanced search

Subject: Re: [SG16-Unicode] [ #embed_str ] Unicode Input
From: JeanHeyd Meneide (phdofthehouse_at_[hidden])
Date: 2019-11-07 17:48:19

On Wed, Nov 6, 2019 at 10:28 PM Thiago Macieira <thiago_at_[hidden]> wrote:

> On Wednesday, 6 November 2019 12:34:23 PST JeanHeyd Meneide wrote:
> > It is not exactly trivial for #embed or #embed_str. #embed generates a
> > brace-delimeted list of the bytes. It's as if the contents are directly
> > replaced by:
> >
> > { 102, 111, 111 }
> >
> > You cannot "just append" a null terminator in there, so it would
> > require a copy. If that's okay (copying things), then we can throw
> > #embed_str out the window. As far as requiring bytes, you would need to
> > generate a brace-delimeted list with all of the entries cast to the right
> > type, because each of those entries is not trivially convertible to a
> > std::byte: https://godbolt.org/z/NRkSfK
> It's easy to add the terminating null with constexpr. And that function
> should
> be provided. Similarly, it should be easy to concatenate such arrays.

Arrays in C++ (and C) do not have any syntax or behavior for compile-time
concatenation. String literals get away with it by having "foo" "bar" be
acceptable syntax, meaning someone could add a null terminator with "\0"
for #embed_str, but not #embed.

It should be easy to import non-terminated byte data, null-terminated byte
> data and UTF-8 text.
> SG16 should also provide a way to constexpr-time convert UTF-8 text to
> UTF-16
> or UTF-32

That is something I am already working on (and a separate proposal); all of
the UTF8/16/32 encoding objects are constexpr, and one of Corentin's
upcoming papers is a consteval ways to detect the compile-time literal
encoding. That should be enough.

I think this is highlighting that #embed is the only thing we need, and
that #embed_str only real benefit is a null terminating code unit and that
there should be better ways to provide that to the user.

SG16 list run by herb.sutter at gmail.com