Date: Wed, 21 Oct 2020 14:10:34 +0100
Re http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2234r0.pdf
Thanks for working on this, Scott. I approve of the overall goal, but have
some comments on specific parts.
Re User Specializations of Certain Templates in Namespace std
Regarding a new attribute or compiler feature to disallow specializing
templates defined in namespace std. Sometimes the std::lib implementation
wants (or needs) to specialize a template, even though users can't. Either
the attribute needs to have a way to disable it on certain specializations,
or the compiler needs to be taught about every special case to know if a
given specialization is OK. For the former case, users can just cheat and
use the same mechanism to disable the decoration on their own
specializations, although I don't think it's reasonable to defend against
sufficiently motivated "attackers" so maybe that doesn't matter. The latter
case (teaching the compiler about every special case) will not work if e.g.
libstdc++ adds a new specialization of a template in std, and somebody
tries to compile it with a version of Clang or EDG released last year,
which doesn't yet know about the extra specialization it needs to allow. I
think in practice any attempt to **require** such specializations to be
ill-formed is hard to get right.
Unless I missed it, the paper only talks about specializations for the
handful of templates where **any** program-defined specialization is
forbidden. It's also undefined to specialize something like
std::allocator<int> or std::basic_string<signed char>, only specializations
that depend on a program-defined type are allowed (C++20 16.4.5.2.1
[namespace.std] p2). That's another huge class of UB (every template in the
std::lib is affected!) which I think is also impractical to make
ill-formed. It might seem that a compiler could issue a diagnostic for
violations of the rule by requiring every specialization of a template in
std to depend on a type defined outside std. But there are types which are
part of the implementation which are defined outside std (e.g. ::mbstate_t
or __gnu_cxx::__some_internal_detail) so the compiler would need to be
taught about every one of those as well.
Re Flowing Off the End of a Non-void Returning Function
Requiring the addition of a call to abort() or similar doesn't only affect
code ordering. It can increase the code size if the compiler can't prove
the abort is unreachable. It's not just some extra text in the source code,
it's extra instructions that aren't needed, which some users of C++ will
say is unacceptable. To extend the mask analogy, it's not necessary for
athletes to wear a mask while competing in a 100m sprint. Masks are good,
but requiring them in all situations without exception causes other
problems. So I think a new decoration is necessary instead of requiring a
call to a [[noreturn]] function.
Some compilers provide extensions to inform the compiler that a certain
code path (such as flowing off the end of a non-void function) is
unreachable, e.g __builtin_unreachable. That could be standardized. Or the
[[noreturn]] attribute could be reused. Instead of the more general
__builtin_unreachable (which can appear anywhere in a function) the
[[noreturn]] attribute could be allowed on an empty statement at the end of
a function, to indicate that omitting the return statement is intentional
because it isn't reachable. It doesn't remove the potential for undefined
behaviour, because the user who adds that decoration could be wrong, and if
control flow does actually get there it's still undefined. But the common
case of accidentally forgetting the return statement or failing to handle a
conditional case could be turned into a required diagnostic. If the user
explicitly adds [[unreachable]] or [[noreturn]] we've reduced the incidence
of UB to the cases where they're explicitly wrong, not just accidentally
careless.
Re memcpy
The paper says "It would probably be undesirable to have different
implementations for ::memcpy and std::memcpy." More than undesirable, it
would be non-conforming. The standard requires that &::memcpy ==
&std::memcpy i.e. there is only one memcpy. I am very strongly opposed to
removing that guarantee.
Re Infinite Loops
Any given implementation is allowed to decide not to optimize away infinite
loops, if they serve a purpose in that environment. That would be a QoI
matter, so I don't think intermediate programmers on embedded systems is a
good motivation to prevent all optimisers from relying on the existing rule
(although maybe it could be made dependent on freestanding/hosted
implementations).
This whole section makes the solution sound very simple, but I think there
needs to be very careful consideration of all the ways that current
implementations make use of the existing rule.
Re Refine Proscription Wording Within the Standard
Although I'm not opposed to the rephrasing of [namespace.std] p6, it seems
like a poor example to choose. What is the practical benefit of postponing
the unspecified behaviour "until the pointer is actually used"? How many
programmers are taking the address of std::lib functions, but not using the
result? Why do we care? Why do we want to allow it? In all practical cases,
if you do $BADTHING then the program's behaviour is unspecified. Rephrasing
it doesn't seem to change anything.
Similarly in the section talking about "unspeciformed", you say "As Joshua
says, we would make the world a safer place". Well as already stated, in
practice you never get a chicken. It
The suggested change for addressable functions and the introduction of a
new term of art would remove hypothetical danger that doesn't affect
anybody, intermediate programmer or not.
re A Process to Identify Useful Changes to the Standard
An audit of how optimising compilers actually rely on some of the rules is
essential before changing them. We don't necessarily have all the right
people in WG21, e.g. for GCC the "middle-end" experts who work on the
optimisers are not involved with WG21, and unlikely to be part of the
"small team" assembled for step 3b. I don't think that needs to be said in
the paper, but just FYI there could be delays in the process due to
soliciting external feedback.
Anyway, sign me up for team 3c, but I'll be surprised if UB in the library
can be significantly reduced beyond what was already done by the following
(and others I can't find right now). LWG already went to A LOT of effort
making unnecessary UB ill-formed.
Mandating the Standard Library: Clause 16 - Language support library <
https://wg21.link/p1458r1>
Mandating the Standard Library: Clause 18 - Diagnostics library <
https://wg21.link/p1459r1>
Mandating the Standard Library: Clause 20 - Strings library <
https://wg21.link/p1462r1>
Mandating the Standard Library: Clause 21 - Containers library <
https://wg21.link/p1463r1>
Mandating the Standard Library: Clause 22 - Iterators library <
https://wg21.link/p1464r1>
Mandating the Standard Library: Clause 25 - Algorithms library <
https://wg21.link/p1718r2>
Mandating the Standard Library: Clause 26 - Numerics library <
https://wg21.link/p1719r2>
Mandating the Standard Library: Clause 27 - Time library <
https://wg21.link/p1686r2>
Mandating the Standard Library: Clause 28 - Localization library <
https://wg21.link/p1720r2>
Mandating the Standard Library: Clause 29 - Input/Output library <
https://wg21.link/p1721r2>
Mandating the Standard Library: Clause 30 - Regular Expression library <
https://wg21.link/p1722r2>
Mandating the Standard Library: Clause 31 - Atomics library <
https://wg21.link/p1723r2>
Mandating the Standard Library: Clause 32 - Thread support library <
https://wg21.link/p1622r3>
Thanks for working on this, Scott. I approve of the overall goal, but have
some comments on specific parts.
Re User Specializations of Certain Templates in Namespace std
Regarding a new attribute or compiler feature to disallow specializing
templates defined in namespace std. Sometimes the std::lib implementation
wants (or needs) to specialize a template, even though users can't. Either
the attribute needs to have a way to disable it on certain specializations,
or the compiler needs to be taught about every special case to know if a
given specialization is OK. For the former case, users can just cheat and
use the same mechanism to disable the decoration on their own
specializations, although I don't think it's reasonable to defend against
sufficiently motivated "attackers" so maybe that doesn't matter. The latter
case (teaching the compiler about every special case) will not work if e.g.
libstdc++ adds a new specialization of a template in std, and somebody
tries to compile it with a version of Clang or EDG released last year,
which doesn't yet know about the extra specialization it needs to allow. I
think in practice any attempt to **require** such specializations to be
ill-formed is hard to get right.
Unless I missed it, the paper only talks about specializations for the
handful of templates where **any** program-defined specialization is
forbidden. It's also undefined to specialize something like
std::allocator<int> or std::basic_string<signed char>, only specializations
that depend on a program-defined type are allowed (C++20 16.4.5.2.1
[namespace.std] p2). That's another huge class of UB (every template in the
std::lib is affected!) which I think is also impractical to make
ill-formed. It might seem that a compiler could issue a diagnostic for
violations of the rule by requiring every specialization of a template in
std to depend on a type defined outside std. But there are types which are
part of the implementation which are defined outside std (e.g. ::mbstate_t
or __gnu_cxx::__some_internal_detail) so the compiler would need to be
taught about every one of those as well.
Re Flowing Off the End of a Non-void Returning Function
Requiring the addition of a call to abort() or similar doesn't only affect
code ordering. It can increase the code size if the compiler can't prove
the abort is unreachable. It's not just some extra text in the source code,
it's extra instructions that aren't needed, which some users of C++ will
say is unacceptable. To extend the mask analogy, it's not necessary for
athletes to wear a mask while competing in a 100m sprint. Masks are good,
but requiring them in all situations without exception causes other
problems. So I think a new decoration is necessary instead of requiring a
call to a [[noreturn]] function.
Some compilers provide extensions to inform the compiler that a certain
code path (such as flowing off the end of a non-void function) is
unreachable, e.g __builtin_unreachable. That could be standardized. Or the
[[noreturn]] attribute could be reused. Instead of the more general
__builtin_unreachable (which can appear anywhere in a function) the
[[noreturn]] attribute could be allowed on an empty statement at the end of
a function, to indicate that omitting the return statement is intentional
because it isn't reachable. It doesn't remove the potential for undefined
behaviour, because the user who adds that decoration could be wrong, and if
control flow does actually get there it's still undefined. But the common
case of accidentally forgetting the return statement or failing to handle a
conditional case could be turned into a required diagnostic. If the user
explicitly adds [[unreachable]] or [[noreturn]] we've reduced the incidence
of UB to the cases where they're explicitly wrong, not just accidentally
careless.
Re memcpy
The paper says "It would probably be undesirable to have different
implementations for ::memcpy and std::memcpy." More than undesirable, it
would be non-conforming. The standard requires that &::memcpy ==
&std::memcpy i.e. there is only one memcpy. I am very strongly opposed to
removing that guarantee.
Re Infinite Loops
Any given implementation is allowed to decide not to optimize away infinite
loops, if they serve a purpose in that environment. That would be a QoI
matter, so I don't think intermediate programmers on embedded systems is a
good motivation to prevent all optimisers from relying on the existing rule
(although maybe it could be made dependent on freestanding/hosted
implementations).
This whole section makes the solution sound very simple, but I think there
needs to be very careful consideration of all the ways that current
implementations make use of the existing rule.
Re Refine Proscription Wording Within the Standard
Although I'm not opposed to the rephrasing of [namespace.std] p6, it seems
like a poor example to choose. What is the practical benefit of postponing
the unspecified behaviour "until the pointer is actually used"? How many
programmers are taking the address of std::lib functions, but not using the
result? Why do we care? Why do we want to allow it? In all practical cases,
if you do $BADTHING then the program's behaviour is unspecified. Rephrasing
it doesn't seem to change anything.
Similarly in the section talking about "unspeciformed", you say "As Joshua
says, we would make the world a safer place". Well as already stated, in
practice you never get a chicken. It
The suggested change for addressable functions and the introduction of a
new term of art would remove hypothetical danger that doesn't affect
anybody, intermediate programmer or not.
re A Process to Identify Useful Changes to the Standard
An audit of how optimising compilers actually rely on some of the rules is
essential before changing them. We don't necessarily have all the right
people in WG21, e.g. for GCC the "middle-end" experts who work on the
optimisers are not involved with WG21, and unlikely to be part of the
"small team" assembled for step 3b. I don't think that needs to be said in
the paper, but just FYI there could be delays in the process due to
soliciting external feedback.
Anyway, sign me up for team 3c, but I'll be surprised if UB in the library
can be significantly reduced beyond what was already done by the following
(and others I can't find right now). LWG already went to A LOT of effort
making unnecessary UB ill-formed.
Mandating the Standard Library: Clause 16 - Language support library <
https://wg21.link/p1458r1>
Mandating the Standard Library: Clause 18 - Diagnostics library <
https://wg21.link/p1459r1>
Mandating the Standard Library: Clause 20 - Strings library <
https://wg21.link/p1462r1>
Mandating the Standard Library: Clause 21 - Containers library <
https://wg21.link/p1463r1>
Mandating the Standard Library: Clause 22 - Iterators library <
https://wg21.link/p1464r1>
Mandating the Standard Library: Clause 25 - Algorithms library <
https://wg21.link/p1718r2>
Mandating the Standard Library: Clause 26 - Numerics library <
https://wg21.link/p1719r2>
Mandating the Standard Library: Clause 27 - Time library <
https://wg21.link/p1686r2>
Mandating the Standard Library: Clause 28 - Localization library <
https://wg21.link/p1720r2>
Mandating the Standard Library: Clause 29 - Input/Output library <
https://wg21.link/p1721r2>
Mandating the Standard Library: Clause 30 - Regular Expression library <
https://wg21.link/p1722r2>
Mandating the Standard Library: Clause 31 - Atomics library <
https://wg21.link/p1723r2>
Mandating the Standard Library: Clause 32 - Thread support library <
https://wg21.link/p1622r3>
Received on 2020-10-21 08:10:50