C++ Logo

std-discussion

Advanced search

Re: Character literals

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 14 Aug 2025 10:07:12 -0700
On Thursday, 14 August 2025 03:27:47 Pacific Daylight Time Russell Shaw via
Std-Discussion wrote:
> The line at AA is an unbounded sequence.

Yes. It's annoying but perfectly valid, like Jens' example shows. Why hex was
left unbounded while octal restricted to 3 digits (9 bits) is lost to the mist
of time. It does allow encoding larger values for wchar_t, but if that was a
goal, why isn't it allowed for octal?

It's annoying for encoders which must look adopt one of these strategies:
* encode the next character as hex too
* look at the next character, close, and reopen the string
* encode as octal, either in 3 digits or using the above again
* encode as universal character with exactly 4 digits (C++11)
* use brackets (C++23)

For string { 0x17, '0', 0 }, valid string forms are:
  "\x17" "0"
  "\x17\x30"
  "\27" "0"
  "\0270"
  "\u00170"
  "\x{17}0"

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Platform & System Engineering

Received on 2025-08-14 17:07:17