C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Comments on P3263R0 Encoding annotated char

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Sat, 16 May 2026 15:12:53 +0000

Hi Jens.

That's all well and good but.
The paper aims to do 2 things:
1. Support non-standardized encodings.
2. Facilitate a common transcoding and formatting language.

From the discussion of the paper, a point that was raised is that the paper would not be properly backed by motivation unless a transcoding library was also provided.
I agree with that point, and I would be more than happy to provide an implementation of a transcoding library.
But that's a lot of work, specially considering that even if I was to provide the complete implementation + the transcoding library the general feeling was that this was not interesting to SG16.


To answer one of your questions:
> Also, please discuss how the facilities in your paper should or should not interact with the id enumeration in std::text_encoding::id.

The answer is, It doesn't! Intentionally, by design. It's the whole point. "std::text_encoding::id", it's a design dead end.

One field where I used to work with this was for Airplane display units. Take for example a B737 CDU, the characters that you see on screen are not in any encoding found in any "standard". The manufacturer of the device provides with a CODE PAGE (a table you type character and what number each character correspond too), so if I wanted to print a specific character on screen I would have to look up the character in the code page see what number it was assigned too and send that number (i.e. transcoding).
(Not just B737 CDU's, but with different types of Boeing, Airbus, each with their own Code Page, plus third party mockups for flight simulators, or even virtually display units that output characters that are nowhere to be found in Unicode).

I want to be able to have the same expressibility that char8_t provides for UTF8, but for encodings that are never going to be standard.
I want to be able to integrate tomorrow some weird encoding a display manufacturer has sent me yesterday, not have to petition the committee to add an extra std::text_encoding::id and a new character type for a weird encoding that some manufacturer just made up, and then wait years for a standard update.
I have the code page from the manufacturer, that's all I should ever need, I can put in all the work required to do the encoding myself the standard just provides the tooling required to express it.


Unless there is a realization that "std::text_encoding::id is a dead end", and as long as there is no appetite to make room for user-provided encodings, I feel that there is no point to keep spending time on this.
Yes, I can update the paper. Yes, I can provide a formatting/transcoding library using this. But only if there's someone else who thinks that at least in principle there's a problem there that is worth solving.

Br,
Tiago



-----Original Message-----
From: SG16 <sg16-bounces_at_lists.isocpp.org> On Behalf Of Jens Maurer via SG16
Sent: Saturday, May 16, 2026 09:50
To: SG16 <sg16_at_lists.isocpp.org>
Cc: Jens Maurer <jens.maurer_at_[hidden]x.net>; Tiago Freire <cpp_at_kaotic.software>
Subject: [isocpp-sg16] Comments on P3263R0 Encoding annotated char

This paper is currently in SG16's inbox.

Here are a few comments.

Assuming the class and concept definitions presented in section 3 "The proposed solution" are the meat of the proposal:

 - Please drop all "inline" annotations for the member functions.
This is purely quality-of-implementation.

 - Please drop all "const" annotations on non-reference parameter types; those have no meaning except inside the function definition (which isn't really part of the interface).

 - When we have tag types in the standard (such as text_encoding here), we don't go to extra lengths to delete its constructors. I don't think there's any active harm done when the user creates an object of an empty class type that hasn't really any use.

 - There already exists a facility to represent text encodings in the standard; see [text.encoding.class]. And the name text_encoding is taken. Please update your paper to reflect that reality. In particular, is the existing std::text_encoding a suitable base class for your facility? If not, please tell us why (in the paper) and rename your proposed "text_encoding" class.

 - I understand that char_enc_t<T> should be "just like a character type", except a distinct type. The term "character type" is reserved to core language-defined types. And, for another reading, the type cannot actually represent (Unicode) characters in all specializations; it represents code units (cf. char_enc_t with an underlying type char8_t).
I'd suggest to review the terminology in your paper, add cross-references to [basic.fundamental] where you mean the core language meaning of "character type", and otherwise take care to refer to code units.

 - Requiring a template parameter to be a class derived from a prescribed class is novel. If you have a template parameter, users should have the full flexibility of plugging in any type, as long as that type models the respective concept.
The present design needs additional rationale to survive.

 - The paper should discuss whether / how char_enc_t satisfies the constraints for the charT template parameter of std::basic_ostream and friends.

 - A number of comparison operations are now auto-generated by the language. Please remove those from the class definition.

 - I disagree with the design direction of overloading every operator under the sun for this new type. Bitwise operations in particular (and possibly arithmetic ones) appear to be misguided for a character- like type. People who want to do that should explicitly cast to the underlying integral type.

 - Also, non-explicit conversion to "bool" needs more rationale.

 - Do not use brace-initialization if regular parenthesis do the same thing.

 - A conversion operator to a modifiable reference seems to totally break any encapsulation we might have here. Please reconsider.

 - The paper is sorely missing specific use-case. What code would I write without this paper, and what would the code look like with this paper? Why is it beneficial to have this library type in the standard in the first place, in lieu of customers rolling their own? After all, this seems extremely mechanical.

My current thinking is that we have these classes of programs:
  - ignorant to encodings, just go with whatever happens
  - implicitly assuming ASCII-only, sort-of works with other single-byte encodings
  - implicitly assuming ASCII-only, sort-of works with UTF-8
  - explicitly supporting multiple encodings; those are runtime-variable (e.g. discovered from an HTTP header)

This paper seems to address a situation where we want to differentiate at compile-time multiple, yet pre-known encodings.
"Oh, now I'm reading this config file, and I compile-time-know it's EBCDIC.
And then, I'm reading this data file, and I compile-tinme-know it's UTF-16.
And I won't bother using iconv to get me consistent UTF-8, but I instead spend lots of effort duplicating my class specializations for each of these types."

Frankly, that feels exactly like a niche the standard shouldn't try to cover. At least not before we got standardized runtime encoding conversions for the general case. And a bit more coverage on the standard library support for the existing charN_t.


 - Is there any implementation experience? Publicly-available source code using such a facility?

Thanks,
Jens

--
SG16 mailing list
SG16_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/sg16
Link to this post: http://lists.isocpp.org/sg16/2026/05/4708.php

Received on 2026-05-16 15:12:58