sg12: Re: [SG12] p1315 secure

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Sat, 25 Apr 2020 12:12:10 +0100

On 25/04/2020 02:20, Miguel Ojeda wrote:
> On Sat, Apr 25, 2020 at 2:25 AM Richard Smith <richardsmith_at_[hidden]> wrote:
>>
>> There has been no reference implementation of p1315 that has been used in production for any significant amount of time (several years). We do not know that its approach works.
>
> Indeed, there is no reference implementation of exactly P1315, but
> there are many implementations out there of the same
> idea/function/concept, as referenced by the proposal. They have been
> in production use for many years, in systems shipped to billions of
> users. The approach definitely works *for what the projects use them*.
> Whether that usage is good or not is another question that (see next
> point).
>
>> And my own 2c: from what I've heard from security-minded folks, the "secure_clear" approach does not and cannot work.
>
> That should be discussed/solved by them, not us. The projects out
> there, however, want to have a "secure_clear"-like function, as shown
> by the proposal.

I will apologise in advance Miguel if the following comes across as
patronising. However I think it would aid you in writing in the
appropriate terminology in the next revision of P1315. I'll also
apologise to Core now, for the likely many mistakes I am about to make
in describing the abstract machine.

What 95% of C and C++ programmers write code for is the concrete
machine. In the concrete machine, all object representations have a well
defined bit layout. Zapping the bits of an object representation to zero
eliminates the value of that object. This is what you want in your
paper. This is what memset_s() does. This is very straightforward on 99%
of compilers out there to implement.

The problem is how to codify the same operation into the abstract
machine: in C++, there are two kinds of object, one is C-ish which is
treated as a bunch of bits, and the other is C++-ish, and is not treated
as a bunch of bits. Only the former has any notion of being "clearable",
because you cannot treat the latter as bits to be cleared without UB.

C-ish objects however have a special property: they can be trivially
copied at any time for any reason, in part or in whole. Therefore, the
value of a C-ish object may be currently strewn throughout memory and
registers, and the compiler may arbitrarily choose any one or more of
those copies it knows exists to use for any operation.

The only current way to make the abstract machine always keep the bit
representation of an object at a single address in memory is by marking
the type of that object volatile. This forces all work with that
object's representation to occur immediately upon that memory address.
Dead store elimination cannot be performed e.g. if you write 5, then a
4, the compiler cannot elide the write of the 5.

The response of Core to proposals like secure_clear() (I'd much rather
it called "always_clear()") is direct: make the type of that object
volatile. Set its bit representation to zero to clear it. No further
standards work needed.

The idea behind my ensure_stores() proposal was a bit of a hack: it is
an intrinsic to tell the compiler "assume stores to this region have
escaped". All compilers already implement this intrinsic, it is better
known as "calling an extern function", so implementation is trivially
easy, and the reference implementation library for ensure_stores() is
literally calling an extern function which the compiler can never Link
Time Optimise. ensure_stores() is effectively a "fsync()" for the
compiler, forcing it to write out the current bit representation to the
escaped region now. The big advantage of this is that the compiler is
free to optimise maximally before ensure_stores(), so performance is
way, way better than volatile.

ensure_stores() made a whole bunch of important people on WG21 queasy
for reasons I don't quite understand, but then I think in terms of
concrete machines, not abstract machines. I suppose that extern
functions don't exist in the abstract machine, nor does escape analysis,
which is an implementation technique. So standardising an intrinsic to
say "assume stores to this region have escaped" is therefore meaningless
to an abstract machine.

Bringing things back to secure_clear()/always_clear(), if Core don't
like a hand waving magical function like memset_s() is in the C
standard, and they don't like a wording based around intrinsic
directives about escape analysis to the optimiser (which was my attempt
at this), then that leaves one remaining option: you need to go tweak
the language in the abstract machine around volatile objects.

There is no problem, in theory, with marking all clear-on-delete objects
as volatile. However, in a concrete machine, that leads to dire
performance. So what we really need here are "constrained volatile
object optimisation", specifically:

- Dead store elimination is permitted, except at certain "fsync moments"

- Write combining is permitted, except at certain "fsync moments"

- Write and read reordering is permitted, except at certain "fsync moments"

I have no idea how to phrase that in a wording acceptable to Core. But
that seems to be the only remaining path forwards which nobody has
attempted yet.

I wish you good luck in your endeavours, and for the record, thank you
for investing your effort to date as this is a problem which is holding
up a ton of other stuff, far far wider than bit clearing (think shared
memory, object serialisation to mapped memory, etc).

Niall

Received on 2020-04-25 06:15:42