SG9 discussed P4030R0 (Endian Views) in Brno (notes, GH issue with polls) this week and polled to forward it to LEWG. I was the sole dissenter on the poll for reasons described below. SG16 has not yet reviewed the proposal.
To be clear, I support the paper, but would like to see additional analysis completed to gain more confidence in the design.
My concerns are listed below. Some of these were not discussed in SG9 because they didn't occur to me until later.
P3477 (There are exactly 8 bits in a byte) was extensively discussed in EWG and LEWG in 2024-2025 but failed to gain consensus across both groups. The status quo is therefore that bytes may have more than 8 bits and such implementations are known to exist (though none are supported by Clang, GCC, or MSVC; at least not in their upstream repositories). std::byteswap() ([bit.byteswap]) does not have differentiated behavior based on CHAR_BIT, but parts of the std::text_encoding identification library do (see [locale.members]p6 and [text.encoding.members]p11,p13,p17). We could follow suit and add CHAR_BIT == 8 mandates to the proposed endian views. If we did, I would encourage adding a similar mandate to std::byteswap() as a DR.
std::byteswap() constrains the types it operates on to those that model integral ([bit.byteswap]p1) with an additional mandate for the lack of padding bits ([bit.byteswap]p2, but see LWG 4583 (std::byteswap can make sense for some types with padding bytes). It has no accommodations for type aliases like std::uint_least16_t that alias a type with a range of values that exceeds that required for the alias. As a result, given a char16_t object C with value 0xFEFF, an implementation for which CHAR_BIT == 8 and std::uint_least16_t aliases a 32-bit type will produce a value of 0xFFFE0000 rather than 0xFFFE for the expression std::byteswap(C). This clearly fails the desired behavior for UTF-16 endian conversions. There are several ways this can be addressed:
Note that, for UTF endian conversions, both CHAR_BIT and the number of bytes in the object representation must be correctly handled in order for byte swapping to produce expected results. An implementation with CHAR_BIT == 9 and sizeof(char16_t) == 4 should still be expected to convert 0xFEFF to 0xFFFE to satisfy Unicode expectations. This result differs from a generic byte endian conversion (which is what std::byteswap() implements and what is proposed by P4030R0 for endian views) and therefore implies that UTF endian conversion should use a distinct algorithm that is independent of the value of CHAR_BIT, the presence of padding bits, and excess value range representation.
File streams and network interfaces typically provide a sequence of bytes as opposed to a sequence of (possibly endian swapped) 16-bit or 32-bit values. The primary motivation offered in P4030R0 is to support UTF encodings, but it doesn't address sequences of bytes other than in its (non-UTF) examples involving cipher suites. The Unicode Standard specifies three encoding forms and seven encoding schemes. Encoding forms are never byte swapped and never contain a BOM; they correspond to text in memory that is ready to be operated on as Unicode text (e.g., text held in sequences of char8_t, char16_t, or char32_t). Encoding schemes are byte oriented and may contain a BOM; they correspond to data read from a file, network, or other device that is in an interchange format and not necessarily ready to be directly operated on as text (e.g., data in sequences of unsigned char, std::byte, or std::uint16_t). The endian views proposed by P4030R0 are useful for the UTF-16 and UTF-32 encoding schemes (when a BOM isn't present or is handled separately), but do not provide direct support for UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE; at least not without a separate transform that aggregates bytes into exact-width code unit size values before applying an endian view.
Lack of support for byte oriented streams and a BOM implies an incomplete solution for the primary motivation stated in P4030R0. I think it could be reasonable to provide such support in a different paper, but I think we should have a clear vision for how that support would work with the proposed endian views before we proceed to standardize them. Per comments above, it seems that generic endian views using the C++ definition of a byte and its code unit types are at odds with the intent of the Unicode Standard with its expectations of 8-bit bytes and exact-width code unit types.
Tom.