C++ Logo


Advanced search

Subject: Feedback on p1629r0
From: Lyberta (lyberta_at_[hidden])
Date: 2019-06-23 03:20:00

This paper also confuses code points and scalar values.

In std::text::encoding_errc it says:

// sequence can be encoded but resulting
// code point is invalid (e.g., encodes a lone surrogate)
invalid_output = 0x05

No, lone surrogates are fully valid code points, but they are invalid
scalar values.

I don't think converting to scalar values is "decoding", especially if
the code uses dumb string types.

In it talks about assuming that text is valid, this can be
enforced by strong types such as scalar_value_sequence.

In 3.2.3 using char32_t directly may be a bad idea. I think we should
focus on strong types instead... Oh, it doesn't require Unicode...

If it provides ASCII it then better provide ASCII character type. We
don't want to continue abusing "char".

I don't like basic_utf8 providing encode_lone_surrogates parameter.
That's not UTF8 then at all.

For scalar value and grapheme cluster containers we would need iterator
or range functions. My code uses next_scalar_value and
previous_scalar_value so I can iterate scalar values inside the code
unit range.

SG16 list run by sg16-owner@lists.isocpp.org