On Wed, 17 Sept 2025 at 22:59, Nate Eldredge <nate@thatsmathematics.com> wrote:


On Sep 17, 2025, at 00:21, Yongwei Wu <wuyongwei@gmail.com> wrote:

I can hardly imagine it is "common". Who would have written such code, especially when it never terminates? I would argue that an infinite loop is better, in that it would alert the programmer that something is broken. To me (and I believe to most C++ programmers), this is a surprising optimization. I really have difficulty imagining it is truly useful.

A very common reason for *humans* to write code to compute a value that is then unused: when the code that would use the value has been turned off for this build.

I'll reply to this post to describe my points.

First, programmers in general do not want to rely on compiler optimizations to get rid of unused code. Do not forget it is also difficult for compilers to prove something is unused, especially when the compiler does not see all code at once.

Another thing is unexpected results. It seems to me very often that programmers want to measure some calculations. However, when volatile variables, locks/atomics, or I/O operations are not involved, the calculation sometimes gets eliminated (and sometimes not). It looks to me like a burden in education.

Here's an example that I think is quite plausible, though not taken from actual code.  Think of this as low-level embedded code for a freestanding implementation (no standard library).

constexpr bool is_debug_build = false;    // could be true in other builds

void output_to_serial_port(const char *buf, unsigned len);

void debug_output_buffer(const char *buf, unsigned len) {
    if (is_debug_build) {
        output_to_serial_port(buf, len);
    }
}

unsigned my_strlen(const char *s) {
    unsigned i;
    for (i = 0; s[i]; i++) { }
    return i;
}

void debug_output_string(const char *s) {
    debug_output_buffer(s, my_strlen(s));
}

Suppose that, in the entire program, there are no strings of length `std::numeric_limits<unsigned>::max` (henceforth UINT_MAX for brevity) or greater.  The programmer is well aware of this, and is willing to promise it under penalty of UB.  Also suppose that all functions shown here can be inlined into each other, but that `debug_output_string` is not inlined into its callers, so the compiler knows nothing statically about `s`.

Under [intro.progress p1] as it stands (either C++23 or C++26), the defined observable semantics of `debug_output_string()` are "do nothing".  This is obviously what the programmer intended, and they would like it to be done as quickly as possible.  And indeed, compilers will compile it to a flat return (https://godbolt.org/z/5sjTYd6dx), optimizing out the loop from `my_strlen`.  Great.  (Of course, the programmer does not want to manually rewrite `debug_output_string()` as { }, because in a debug build, it should actually do something.)

But without [intro.progress p1], the defined observable semantics of `debug_output_string()` are not simply "do nothing".  Rather, they are "do nothing, unless s is a string of length UINT_MAX or greater, in which case loop forever and do not proceed with the rest of the program".  (Recall that unsigned integer overflow is not UB and is defined to simply wrap around.) If the compiler must provide those semantics, then it must actually iterate over the string, just in case it should happen to have length UINT_MAX.  Again, you can see this in practice by using `-fno-finite-loops`: https://godbolt.org/z/GqqePv8Th.  

The programmer already knows that the latter case will never happen, but AFAIK there is no simple way for them to communicate this to the compiler.  So without [intro.progress p1], the compiler is required to waste a lot of runtime pointlessly looping over strings.

(I know the programmer could avoid this in other ways, e.g. by putting `if (is_debug_build)` around the body of `debug_output_string()`.  But we could suppose that `debug_output_buffer()` is called from several different functions: `debug_output_int()`, `debug_output_struct_foo()`, etc.  By the principle of DRY, the programmer should prefer to put the test of `is_debug_build` in just one place.)

First, it is a valid example, and you provided counterarguments, making it unnecessary for me to say things that you know already. I just want to emphasize that there are already a lot of things C++ programmers need to be careful with, like that we should try to put string/vector variables outside the loop, as the existence of global allocation/deallocation functions make it generally impossible for the compiler to optimize away allocations and deallocations. (And I truly want deterministic behaviour and do not like magical optimizations here.)

. . . . . . . . .

Having said all this, let's step back.  It seems to me that, among all loops with no side effects, there are two types: (A) intended by the programmer to always terminate; (B) intended by the programmer to possibly loop forever and prevent further execution.

The approach of C++23 was to assume all loops are of type A.  If the programmer desires a loop of type B, they must manually add a side effect.

C++26 adds the exception for "trivially empty loops" (https://eel.is/c++draft/stmt.iter.general#3) which are assumed to be of type B.  This does not cover your Fermat program, though I don't think it's a good example: as long as memory is finite, the search *must* terminate somehow after a finite number of iterations, so it can't really be an infinite loop in any case.  But I'm willing to concede that there exist other type B loops that are not "trivially empty".

One thought would be to have an attribute or something, to allow the programmer to specify what type of loop is desired.  And I can see an argument that B ought to be the default, so that a program always "does what it says".  The programmer could then "opt in" to A with [[finite_loop]] or something like that, in cases where they can prove statically that the loop always terminates.

For the sake of programmer friendliness (by the way, UB is extremely programmer-unfriendly), I would like to see forward progress opt-in, instead of the current opposite way. Do not surprise your users (programmers here). Do not expect every programmer to read the C++ standard (just like most humans want to avoid reading user manuals wherever possible).

I am in the process of preparing a lecture on UB, and it is really a pain to find that there are so many surprises in the C++ language.... I concur that UB is unavoidable at present, but I would argue it is better to have less UB where possible, especially when making UB defined does not bring real harm.

While I agree the optimization like you mentioned may really exist, I still can hardly imagine people will rely on it. It is too fragile—a single memory allocation can make the optimization impossible at run time.

--
Yongwei Wu
URL: http://wyw.dcweb.cn/