liaison: Re: [wg14/wg21 liaison] (SC22WG14.18841) [Fwd: sudo buffer overlow]

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Sun, 31 Jan 2021 17:37:21 +0000

On 30/01/2021 22:27, Uecker, Martin wrote:

>> That is indeed useful. However if C were to adopt C++ lambdas, that's
>> 95% of you there (plus lambdas solve an absolute ton of other itches in
>> C).
>
> You still need the generic type. This is why
> C++ has std::function, or?

Historically, std::function (really boost::function) came long before
lambdas. So there is a fair bit of use case overlap.

However, for modern code, you ought to only use std::function to type
erase an arbitrary callable into a fixed type and size of callable.

Please allow me to explain.

C++ lambdas can be thought of as transforming one call signature into
another through the _binding_ of state:

```
// Here is some state to capture
int x = 5, y = 6;

// Create a lambda which captures the current values of x and y
// into the lambda's state. This lambda has the callable type
// int(&)(int).
auto f = [x, y](int z) -> int { return x + y + z; };

// The size of the object at f is as if struct { int x, y; };
assert(sizeof(f) == 2 * sizeof(int));

// I can invoke f(), it adds the parameter to the bound values
assert(12 == f(1));
```

Lambdas are great for writing glue and thunk code which binds dissimilar
APIs together, but the price paid is that every lambda gets a unique
type, and they can be arbitrarily large. For example:

```
int arr[1024];
auto l = [=](int x) -> int { ... };
assert(sizeof(l) == 1024 * sizeof(int));
```

What std::function does is erase the type and size of its input into a
fixed type and size, so:

```
int arr[1024];
auto l = [=](int x) -> int { ... };
assert(sizeof(l) == 1024 * sizeof(int));

// Erase the type and size of input callable
// into a fixed type and size callable
std::function<int(int)> f(l);
assert(sizeof(f) == 4 * sizeof(void *));

// But both have identical side effects
assert(l(5) == f(5));
```

Now your extern APIs can exclusively speak in terms of
std::function<int(int)>, and ANY invocable, of any type and any size
which has the call signature int(&)(int) can be represented by
std::function<int(int)>.

As you can see, this means ABI is fixed and thus binary calling
convention problems go away, yet runtime code can feed such APIs an
arbitrary lambda or callable.

One can apply exactly the same technique to any arbitrary foreign
function interface, including arbitrary foreign calling conventions.

Hopefully this made sense to you? Just to be clear, C under this
proposal would have most of C++ lambdas, but C's equivalent to
std::function<ret(args...)> would probably be something like
_Function(ret, args...). i.e. _Function would be a compiler built-in.

>> Well that depends on how you'd implement them I think. I, like most C++
>> folk, dislike intensely any notion of runtime-varying type or size of
>> type.
>
> The problem is that if you want to really capture the concept
> of a dynamically bounded array using the type system, you
> automatically end up with runtime-varying types (i.e.
> dependent types as they are called it type theory).

I think you hit the nail exactly on the head here - very few on WG21
like the idea of *directly* capturing anything dynamic into the type
system. On the few occasions we have (e.g. RTTI, C++ exception throws),
we introduced non-deterministic execution behaviour which caused around
40% of our userbase to globally disable RTTI and C++ exception throws. I
don't think there is much appetite to adopt any design which increase
our userbase globally disabling future language features.

What has worked well for us, however, is the same technique as
std::function above - one *indirects* the capture of dynamic information
into the type system i.e. we declare a type, and that type means "this
type indirects to dynamic information, and here is how you do that
indirection".

That keeps our type system 100% static, at the topmost level, and keeps
all dynamic information at one level removed.

I'm not a language person, so I'm the wrong person to say much more. But
from my best understanding, most on EWG would consider runtime varying
types a bad design choice, given the better alternatives available. I
very definitely do not get the impression that anybody on WG21 much
likes RTTI, as currently designed and implemented.

(Note that Boost.TypeIndex implements 98% of RTTI, and all of the RTTI
almost anyone ever uses, but in a 100% deterministic mechanism. Since
that library landed into Boost, it has conclusively proved that C++ RTTI
is suboptimal to what it could have been instead. But ship has sailed
etc etc)

> If C++ rejects dependent types because C++ people "dislike" them
> (as you said above), then your type system will remain limited in
> what you could do with it. Of course, external tools could
> always attach their own more powerful types to C++ terms, but
> the C++ type system could not help.

Thanks for all the exposition I didn't reply to in order to keep this
reply short.

Don't get me wrong here, other C "successors" went the dynamic runtime
type route, Objective C being the most obvious, but C# is another famous
one. Whilst there is nothing wrong with languages such as Objective C or
C#, in my personal opinion I would say that their time has passed
relative to better alternatives.

I don't want to make too sweeping a statement, but in my experience
dynamic runtime types never optimise well. If you write your Python with
static types, Pyston will optimise the snot out of it. If you write your
Python with dynamic types, Pyston gives up very easily.

I'm not a language person, nor a compiler person, but it seems to me
that dynamic runtime types do not easily produce bare metal execution
performance. I think indirection of dynamic information whilst keeping
the top level of the type system 100% static is a better design approach.

Niall

Received on 2021-01-31 11:37:26