Date: Mon, 18 May 2026 23:08:47 +0800
On Mon, May 18, 2026 at 4:38 PM Jan Schultke via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> You should also consider that it's probably not good to deprecate the
case where you append char to std::u8strng and append char8_t to
std::string.
>
> I imagine the warnings would be too noisy to pull that off in existing
code
> bases that make heavy use of these types and don't catch such conversions
> already.
>
> There is also the case of appending unsigned char to std::string I
suppose.
Thanks, that is a good point.
I agree that `char`/`char8_t` and `unsigned char` cases should probably not
be lumped together with obviously suspicious scalar conversions such as
`bool`, floating point, arbitrary `int`, or enum values.
In particular, an exact-`CharT` rule is attractive as a clean model, but it
would also diagnose cases like:
```cpp
std::u8string u;
u += 'x';
std::string s;
s += u8'x';
unsigned char b = ...;
s += b;
```
Those are likely to be common in code that treats `std::string` as a byte
or UTF-8 code-unit sequence, or in code gradually migrating to `char8_t`.
So I think the corpus/checker work should classify these cases separately,
rather than treating all non-`CharT` conversions as one bucket. A useful
initial checker might have at least two modes:
1. a conservative/default mode that diagnoses high-confidence cases such as
`bool`, floating point, enum, and perhaps non-character integer literals;
2. a strict mode that also diagnoses non-exact character-code-unit
conversions such as `char` <-> `char8_t` and `unsigned char` -> `char`.
That would let us measure how noisy the exact-`CharT` model would be before
proposing any standard wording.
Thanks again.
std-proposals_at_[hidden]> wrote:
> You should also consider that it's probably not good to deprecate the
case where you append char to std::u8strng and append char8_t to
std::string.
>
> I imagine the warnings would be too noisy to pull that off in existing
code
> bases that make heavy use of these types and don't catch such conversions
> already.
>
> There is also the case of appending unsigned char to std::string I
suppose.
Thanks, that is a good point.
I agree that `char`/`char8_t` and `unsigned char` cases should probably not
be lumped together with obviously suspicious scalar conversions such as
`bool`, floating point, arbitrary `int`, or enum values.
In particular, an exact-`CharT` rule is attractive as a clean model, but it
would also diagnose cases like:
```cpp
std::u8string u;
u += 'x';
std::string s;
s += u8'x';
unsigned char b = ...;
s += b;
```
Those are likely to be common in code that treats `std::string` as a byte
or UTF-8 code-unit sequence, or in code gradually migrating to `char8_t`.
So I think the corpus/checker work should classify these cases separately,
rather than treating all non-`CharT` conversions as one bucket. A useful
initial checker might have at least two modes:
1. a conservative/default mode that diagnoses high-confidence cases such as
`bool`, floating point, enum, and perhaps non-character integer literals;
2. a strict mode that also diagnoses non-exact character-code-unit
conversions such as `char` <-> `char8_t` and `unsigned char` -> `char`.
That would let us measure how noisy the exact-`CharT` model would be before
proposing any standard wording.
Thanks again.
-- Qirong Zhang
Received on 2026-05-18 15:08:54
