Date: Wed, 17 Sep 2025 14:59:13 +0000
> On Sep 17, 2025, at 00:21, Yongwei Wu <wuyongwei_at_[hidden]> wrote:
>
> I can hardly imagine it is "common". Who would have written such code, especially when it never terminates? I would argue that an infinite loop is better, in that it would alert the programmer that something is broken. To me (and I believe to most C++ programmers), this is a surprising optimization. I really have difficulty imagining it is truly useful.
A very common reason for *humans* to write code to compute a value that is then unused: when the code that would use the value has been turned off for this build.
Here's an example that I think is quite plausible, though not taken from actual code. Think of this as low-level embedded code for a freestanding implementation (no standard library).
constexpr bool is_debug_build = false; // could be true in other builds
void output_to_serial_port(const char *buf, unsigned len);
void debug_output_buffer(const char *buf, unsigned len) {
if (is_debug_build) {
output_to_serial_port(buf, len);
}
}
unsigned my_strlen(const char *s) {
unsigned i;
for (i = 0; s[i]; i++) { }
return i;
}
void debug_output_string(const char *s) {
debug_output_buffer(s, my_strlen(s));
}
Suppose that, in the entire program, there are no strings of length `std::numeric_limits<unsigned>::max` (henceforth UINT_MAX for brevity) or greater. The programmer is well aware of this, and is willing to promise it under penalty of UB. Also suppose that all functions shown here can be inlined into each other, but that `debug_output_string` is not inlined into its callers, so the compiler knows nothing statically about `s`.
Under [intro.progress p1] as it stands (either C++23 or C++26), the defined observable semantics of `debug_output_string()` are "do nothing". This is obviously what the programmer intended, and they would like it to be done as quickly as possible. And indeed, compilers will compile it to a flat return (https://godbolt.org/z/5sjTYd6dx), optimizing out the loop from `my_strlen`. Great. (Of course, the programmer does not want to manually rewrite `debug_output_string()` as { }, because in a debug build, it should actually do something.)
But without [intro.progress p1], the defined observable semantics of `debug_output_string()` are not simply "do nothing". Rather, they are "do nothing, unless s is a string of length UINT_MAX or greater, in which case loop forever and do not proceed with the rest of the program". (Recall that unsigned integer overflow is not UB and is defined to simply wrap around.) If the compiler must provide those semantics, then it must actually iterate over the string, just in case it should happen to have length UINT_MAX. Again, you can see this in practice by using `-fno-finite-loops`: https://godbolt.org/z/GqqePv8Th.
The programmer already knows that the latter case will never happen, but AFAIK there is no simple way for them to communicate this to the compiler. So without [intro.progress p1], the compiler is required to waste a lot of runtime pointlessly looping over strings.
(I know the programmer could avoid this in other ways, e.g. by putting `if (is_debug_build)` around the body of `debug_output_string()`. But we could suppose that `debug_output_buffer()` is called from several different functions: `debug_output_int()`, `debug_output_struct_foo()`, etc. By the principle of DRY, the programmer should prefer to put the test of `is_debug_build` in just one place.)
. . . . . . . . .
Having said all this, let's step back. It seems to me that, among all loops with no side effects, there are two types: (A) intended by the programmer to always terminate; (B) intended by the programmer to possibly loop forever and prevent further execution.
The approach of C++23 was to assume all loops are of type A. If the programmer desires a loop of type B, they must manually add a side effect.
C++26 adds the exception for "trivially empty loops" (https://eel.is/c++draft/stmt.iter.general#3) which are assumed to be of type B. This does not cover your Fermat program, though I don't think it's a good example: as long as memory is finite, the search *must* terminate somehow after a finite number of iterations, so it can't really be an infinite loop in any case. But I'm willing to concede that there exist other type B loops that are not "trivially empty".
One thought would be to have an attribute or something, to allow the programmer to specify what type of loop is desired. And I can see an argument that B ought to be the default, so that a program always "does what it says". The programmer could then "opt in" to A with [[finite_loop]] or something like that, in cases where they can prove statically that the loop always terminates.
>
> I can hardly imagine it is "common". Who would have written such code, especially when it never terminates? I would argue that an infinite loop is better, in that it would alert the programmer that something is broken. To me (and I believe to most C++ programmers), this is a surprising optimization. I really have difficulty imagining it is truly useful.
A very common reason for *humans* to write code to compute a value that is then unused: when the code that would use the value has been turned off for this build.
Here's an example that I think is quite plausible, though not taken from actual code. Think of this as low-level embedded code for a freestanding implementation (no standard library).
constexpr bool is_debug_build = false; // could be true in other builds
void output_to_serial_port(const char *buf, unsigned len);
void debug_output_buffer(const char *buf, unsigned len) {
if (is_debug_build) {
output_to_serial_port(buf, len);
}
}
unsigned my_strlen(const char *s) {
unsigned i;
for (i = 0; s[i]; i++) { }
return i;
}
void debug_output_string(const char *s) {
debug_output_buffer(s, my_strlen(s));
}
Suppose that, in the entire program, there are no strings of length `std::numeric_limits<unsigned>::max` (henceforth UINT_MAX for brevity) or greater. The programmer is well aware of this, and is willing to promise it under penalty of UB. Also suppose that all functions shown here can be inlined into each other, but that `debug_output_string` is not inlined into its callers, so the compiler knows nothing statically about `s`.
Under [intro.progress p1] as it stands (either C++23 or C++26), the defined observable semantics of `debug_output_string()` are "do nothing". This is obviously what the programmer intended, and they would like it to be done as quickly as possible. And indeed, compilers will compile it to a flat return (https://godbolt.org/z/5sjTYd6dx), optimizing out the loop from `my_strlen`. Great. (Of course, the programmer does not want to manually rewrite `debug_output_string()` as { }, because in a debug build, it should actually do something.)
But without [intro.progress p1], the defined observable semantics of `debug_output_string()` are not simply "do nothing". Rather, they are "do nothing, unless s is a string of length UINT_MAX or greater, in which case loop forever and do not proceed with the rest of the program". (Recall that unsigned integer overflow is not UB and is defined to simply wrap around.) If the compiler must provide those semantics, then it must actually iterate over the string, just in case it should happen to have length UINT_MAX. Again, you can see this in practice by using `-fno-finite-loops`: https://godbolt.org/z/GqqePv8Th.
The programmer already knows that the latter case will never happen, but AFAIK there is no simple way for them to communicate this to the compiler. So without [intro.progress p1], the compiler is required to waste a lot of runtime pointlessly looping over strings.
(I know the programmer could avoid this in other ways, e.g. by putting `if (is_debug_build)` around the body of `debug_output_string()`. But we could suppose that `debug_output_buffer()` is called from several different functions: `debug_output_int()`, `debug_output_struct_foo()`, etc. By the principle of DRY, the programmer should prefer to put the test of `is_debug_build` in just one place.)
. . . . . . . . .
Having said all this, let's step back. It seems to me that, among all loops with no side effects, there are two types: (A) intended by the programmer to always terminate; (B) intended by the programmer to possibly loop forever and prevent further execution.
The approach of C++23 was to assume all loops are of type A. If the programmer desires a loop of type B, they must manually add a side effect.
C++26 adds the exception for "trivially empty loops" (https://eel.is/c++draft/stmt.iter.general#3) which are assumed to be of type B. This does not cover your Fermat program, though I don't think it's a good example: as long as memory is finite, the search *must* terminate somehow after a finite number of iterations, so it can't really be an infinite loop in any case. But I'm willing to concede that there exist other type B loops that are not "trivially empty".
One thought would be to have an attribute or something, to allow the programmer to specify what type of loop is desired. And I can see an argument that B ought to be the default, so that a program always "does what it says". The programmer could then "opt in" to A with [[finite_loop]] or something like that, in cases where they can prove statically that the loop always terminates.
Received on 2025-09-17 14:59:16