ISOCPP sg16 List: [isocpp-sg16] Comments on P3263R0 Encoding annotated char

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sat, 16 May 2026 09:49:54 +0200

This paper is currently in SG16's inbox.

Here are a few comments.

Assuming the class and concept definitions presented in section 3 "The proposed
solution" are the meat of the proposal:

- Please drop all "inline" annotations for the member functions.
This is purely quality-of-implementation.

- Please drop all "const" annotations on non-reference parameter types;
those have no meaning except inside the function definition (which isn't
really part of the interface).

- When we have tag types in the standard (such as text_encoding here),
we don't go to extra lengths to delete its constructors. I don't think
there's any active harm done when the user creates an object of an empty
class type that hasn't really any use.

- There already exists a facility to represent text encodings in the
standard; see [text.encoding.class]. And the name text_encoding is
taken. Please update your paper to reflect that reality. In particular,
is the existing std::text_encoding a suitable base class for your
facility? If not, please tell us why (in the paper) and rename your
proposed "text_encoding" class.
Also, please discuss how the facilities in your paper should or
should not interact with the id enumeration in std::text_encoding::id.

- I understand that char_enc_t<T> should be "just like a character type",
except a distinct type. The term "character type" is reserved to core
language-defined types. And, for another reading, the type cannot
actually represent (Unicode) characters in all specializations; it
represents code units (cf. char_enc_t with an underlying type char8_t).
I'd suggest to review the terminology in your paper, add cross-references
to [basic.fundamental] where you mean the core language meaning of
"character type", and otherwise take care to refer to code units.

- Requiring a template parameter to be a class derived from a
prescribed class is novel. If you have a template parameter,
users should have the full flexibility of plugging in any type,
as long as that type models the respective concept.
The present design needs additional rationale to survive.

- The paper should discuss whether / how char_enc_t satisfies the
constraints for the charT template parameter of std::basic_ostream
and friends.

- A number of comparison operations are now auto-generated by the
language. Please remove those from the class definition.

- I disagree with the design direction of overloading every operator
under the sun for this new type. Bitwise operations in particular
(and possibly arithmetic ones) appear to be misguided for a character-
like type. People who want to do that should explicitly cast
to the underlying integral type.

- Also, non-explicit conversion to "bool" needs more rationale.

- Do not use brace-initialization if regular parenthesis do the same
thing.

- A conversion operator to a modifiable reference seems to totally
break any encapsulation we might have here. Please reconsider.

- The paper is sorely missing specific use-case. What code
would I write without this paper, and what would the code look
like with this paper? Why is it beneficial to have this library
type in the standard in the first place, in lieu of customers
rolling their own? After all, this seems extremely mechanical.

My current thinking is that we have these classes of programs:
  - ignorant to encodings, just go with whatever happens
  - implicitly assuming ASCII-only, sort-of works with other single-byte encodings
  - implicitly assuming ASCII-only, sort-of works with UTF-8
  - explicitly supporting multiple encodings; those are runtime-variable
(e.g. discovered from an HTTP header)

This paper seems to address a situation where we want to differentiate
at compile-time multiple, yet pre-known encodings.
"Oh, now I'm reading this config file, and I compile-time-know it's EBCDIC.
And then, I'm reading this data file, and I compile-tinme-know it's UTF-16.
And I won't bother using iconv to get me consistent UTF-8,
but I instead spend lots of effort duplicating my class specializations
for each of these types."

Frankly, that feels exactly like a niche the standard shouldn't try to
cover. At least not before we got standardized runtime encoding
conversions for the general case. And a bit more coverage on the
standard library support for the existing charN_t.

- Is there any implementation experience? Publicly-available source code
using such a facility?

Thanks,
Jens

Received on 2026-05-16 07:49:57