Hi Jonathan,

On Wed, Oct 21, 2020 at 6:10 AM Jonathan Wakely <cxx@kayari.org> wrote:
Re http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2234r0.pdf

Thanks for working on this, Scott. I approve of the overall goal, but have some comments on specific parts.

Thanks for your careful reading of the paper.  I appreciate your feedback on the suggested approaches.  I believe every one of them.  I'm happy to fix those examples which are fixable and remove any suggested approaches that are simply not workable.  Would you be willing to help with that?

And, yes, I completely agree that participation and review by optimizer writers is critical to the successful outcome of the audit.  I'm delighted that you seem to have a sense for who these folks would be for gcc.  Of course we'll need clang and Visual Studio optimizer writers involved with the review as well.  Yup, that will certainly influence how quickly the work can proceed.

I'm quite happy that you approve of the overall goal.  Motivating the audit (or something like it) was really the point of the paper.

Scott Schurr
S.Scott.Schurr@gmail.com
 
Re User Specializations of Certain Templates in Namespace std

Regarding a new attribute or compiler feature to disallow specializing templates defined in namespace std. Sometimes the std::lib implementation wants (or needs) to specialize a template, even though users can't. Either the attribute needs to have a way to disable it on certain specializations, or the compiler needs to be taught about every special case to know if a given specialization is OK. For the former case, users can just cheat and use the same mechanism to disable the decoration on their own specializations, although I don't think it's reasonable to defend against sufficiently motivated "attackers" so maybe that doesn't matter. The latter case (teaching the compiler about every special case) will not work if e.g. libstdc++ adds a new specialization of a template in std, and somebody tries to compile it with a version of Clang or EDG released last year, which doesn't yet know about the extra specialization it needs to allow. I think in practice any attempt to **require** such specializations to be ill-formed is hard to get right.

Unless I missed it, the paper only talks about specializations for the handful of templates where **any** program-defined specialization is forbidden. It's also undefined to specialize something like std::allocator<int> or std::basic_string<signed char>, only specializations that depend on a program-defined type are allowed (C++20 16.4.5.2.1 [namespace.std] p2). That's another huge class of UB (every template in the std::lib is affected!) which I think is also impractical to make ill-formed. It might seem that a compiler could issue a diagnostic for violations of the rule by requiring every specialization of a template in std to depend on a type defined outside std. But there are types which are part of the implementation which are defined outside std (e.g. ::mbstate_t or __gnu_cxx::__some_internal_detail) so the compiler would need to be taught about every one of those as well.

Re Flowing Off the End of a Non-void Returning Function

Requiring the addition of a call to abort() or similar doesn't only affect code ordering. It can increase the code size if the compiler can't prove the abort is unreachable. It's not just some extra text in the source code, it's extra instructions that aren't needed, which some users of C++ will say is unacceptable. To extend the mask analogy, it's not necessary for athletes to wear a mask while competing in a 100m sprint. Masks are good, but requiring them in all situations without exception causes other problems. So I think a new decoration is necessary instead of requiring a call to a [[noreturn]] function.

Some compilers provide extensions to inform the compiler that a certain code path (such as flowing off the end of a non-void function) is unreachable, e.g __builtin_unreachable. That could be standardized. Or the [[noreturn]] attribute could be reused. Instead of the more general __builtin_unreachable (which can appear anywhere in a function) the [[noreturn]] attribute could be allowed on an empty statement at the end of a function, to indicate that omitting the return statement is intentional because it isn't reachable. It doesn't remove the potential for undefined behaviour, because the user who adds that decoration could be wrong, and if control flow does actually get there it's still undefined. But the common case of accidentally forgetting the return statement or failing to handle a conditional case could be turned into a required diagnostic. If the user explicitly adds [[unreachable]] or [[noreturn]] we've reduced the incidence of UB to the cases where they're explicitly wrong, not just accidentally careless.

Re memcpy

The paper says "It would probably be undesirable to have different implementations for ::memcpy and std::memcpy." More than undesirable, it would be non-conforming. The standard requires that &::memcpy == &std::memcpy i.e. there is only one memcpy. I am very strongly opposed to removing that guarantee.

Re Infinite Loops

Any given implementation is allowed to decide not to optimize away infinite loops, if they serve a purpose in that environment. That would be a QoI matter, so I don't think intermediate programmers on embedded systems is a good motivation to prevent all optimisers from relying on the existing rule (although maybe it could be made dependent on freestanding/hosted implementations).

This whole section makes the solution sound very simple, but I think there needs to be very careful consideration of all the ways that current implementations make use of the existing rule.


Re Refine Proscription Wording Within the Standard

Although I'm not opposed to the rephrasing of [namespace.std] p6, it  seems like a poor example to choose. What is the practical benefit of postponing the unspecified behaviour "until the pointer is actually used"? How many programmers are taking the address of std::lib functions, but not using the result? Why do we care? Why do we want to allow it? In all practical cases, if you do $BADTHING then the program's behaviour is unspecified. Rephrasing it doesn't seem to change anything.

Similarly in the section talking about "unspeciformed", you say "As Joshua says, we would make the world a safer place". Well as already stated, in practice you never get a chicken. It

The suggested change for addressable functions and the introduction of a new term of art would remove hypothetical danger that doesn't affect anybody, intermediate programmer or not.


re A Process to Identify Useful Changes to the Standard

An audit of how optimising compilers actually rely on some of the rules is essential before changing them. We don't necessarily have all the right people in WG21, e.g. for GCC the "middle-end" experts who work on the optimisers are not involved with WG21, and unlikely to be part of the "small team" assembled for step 3b. I don't think that needs to be said in the paper, but just FYI there could be delays in the process due to soliciting external feedback.

Anyway, sign me up for team 3c, but I'll be surprised if UB in the library can be significantly reduced beyond what was already done by the following (and others I can't find right now). LWG already went to A LOT of effort making unnecessary UB ill-formed.

Mandating the Standard Library: Clause 16 - Language support library <https://wg21.link/p1458r1>
Mandating the Standard Library: Clause 18 - Diagnostics library <https://wg21.link/p1459r1>
Mandating the Standard Library: Clause 20 - Strings library <https://wg21.link/p1462r1>
Mandating the Standard Library: Clause 21 - Containers library <https://wg21.link/p1463r1>
Mandating the Standard Library: Clause 22 - Iterators library <https://wg21.link/p1464r1>
Mandating the Standard Library: Clause 25 - Algorithms library <https://wg21.link/p1718r2>
Mandating the Standard Library: Clause 26 - Numerics library <https://wg21.link/p1719r2>
Mandating the Standard Library: Clause 27 - Time library <https://wg21.link/p1686r2>
Mandating the Standard Library: Clause 28 - Localization library <https://wg21.link/p1720r2>
Mandating the Standard Library: Clause 29 - Input/Output library <https://wg21.link/p1721r2>
Mandating the Standard Library: Clause 30 - Regular Expression library <https://wg21.link/p1722r2>
Mandating the Standard Library: Clause 31 - Atomics library <https://wg21.link/p1723r2>
Mandating the Standard Library: Clause 32 - Thread support library <https://wg21.link/p1622r3>