Date: Sat, 13 Jul 2024 12:57:23 +0200
On 12/07/2024 16:46, Henry Miller wrote:
>> True, but the goal is to address this for future code.
>
> Only if we can predict the future and choose to address it. In the above company politics situation management knew in advance this politics was happening and decided that getting today's features out the door was more important than mitigating the cost of ABI, and so they intentionally did not tell engineering about the politics until it was too late to make changes. As an engineer of course I call this a bad decision, but those managers call it the right decision - who is right is not a knowable exercise.
I've had to deal with this a few times. In all cases the problem wasn't
that someone had conciously chosen to optimize for delivery over ABI,
but rather that they were completely unaware that there was an ABI issue
to begin with. To them, std::string was a vocabulary type, and the idea
that it wouldn't be compatible _with itself_ was just something they
never expected.
> The important take away is if [language/tools/god/whatever] doesn't force you to do the right thing in advance someone will get it wrong and we are back where we are today.
This is why I added the concept of public interfaces. They make it easy
to do the right thing (and come with alluring bonus of possibly better
performance), and they won't allow instable types for their parameters.
The current situation is that the Standard never said that classes would
be stable, but it also never said that classes would not be stable, so
people have an excuse for using them in public interfaces. With a formal
definition, there is no more excuse: if you use an unstable class in a
public interface, that's UB, and the consequences are on you. Neither
the Standard, nor the compiler implementers, will bend over backwards to
accomodate you.
>> The point of having std::stable classes is that they act as a common
>> interface. They are intended for use on public interfaces, not for use
>> throughout your software. You convert between your internal std::string
>> and std::stable::string whenever you pass through the interface.
>
> isn't that what char * and (struct*, size_t length) in extern "C" is? A standard abi for interfaces. However as above the hard part is realizing in advance where the interfaces are. I wouldn't want to make all functions calls go through C ABI just in case, that would perform badly and we lose all the nice things we get from C++ containers.
It currently is, yes. For C++ it would be great if we could at least
pass proper strings and vectors through public interfaces, instead of
always relying on pointers, sizes, manual deallocation, etc.
I can't immediately recall running into a situation where there was
confusion about what the API of a library was. Usually the public API
has its own headers, and is significantly better documented than any
private functions are.
Note that the proposal changes nothing for your internal libraries that
you compile as part of your system. Those aren't public, and thus don't
need to be designed with stability concerns in mind.
>> So if some mythical future CPU works better if strings are implemented
>> as linked lists, you can just change std::string to use linked lists,
>> and if you recompile one of your libraries it won't immediately be
>> incompatible with all your other software, because in its public
>> interface it still presents the data as an array of char, same as
>> before. At that point the conversion from std::string (the linked list
>> version in the library) to std::stable::string will be much more
>> expensive, but at least it will still work.
>
> But I want to use the linked-list string implementation where possible on this platform. Performance matters and I don't want to pay for the costs to convert to the standard ABI and back just because somebody might in the future change the ABI of string or vector. We have a history of the C++ library rarely changing ABI, so I'd even call that unlikely future - but unlikely does not mean never and so as someone who cares about ABI I should do this anyway.
That is a rather far out hypothetical, isn't it? The definition of
strings or vectors is not going to change. We might get new containers,
and those might even one day supplant strings and vectors, but they'll
have different names.
But we might of course want to change std::string. What would you rather
have: the current situation, where std::string is set in stone, but
transfer through public interfaces is cheap, or a potential future where
std::string has 20% more performance, but you pay a small cost when
passing through a public interface?
gcc's std::string currently spends 16 bytes on the character pointer and
the size, and another 16 bytes on the SSO buffer. Why is it storing the
character pointer at all when it is in SSO mode? Why does it need 8
bytes for the size in SSO mode? At the same memory cost, it could have
up to 31 characters in its SSO buffer, using the last byte in the buffer
to act as both the size, the terminating zero, _and_ the indication of
whether the buffer is an SSO buffer, or a pointer to the heap-allocated
buffer!
I tested with my own string class (many years ago) and found that
increasing the size of the SSO buffer from 16 to 32 significantly
increased performance of the entire application. But can gcc do this?
Nope, they will never change the ABI of std::string again, even though
quite a bit of performance is left on the table.
> Which is to say I want some sort of MAGIC so that if both sides do use the same ABI they don't go through the optimal API because I don't want to pay for what I'm not using.
That would be great, but I don't see it happening.
>> If the cost of conversion becomes too high to bear a new stable type can
>> be introduced at that time.
>
> Nothing stops us from introducing std2:: or whatever bikeshed name you want to give to the replacement type. ABI is hard because everyone needs to switch to that new type though - some want to switch everything instantly while others have what to them is good reason not to.
There's no need for everyone to switch at the same time:
std::stable::string can coexist with std2::stable::string, and convert
back and forth. That allows coexistence of libraries that use the old
version with libraries that use the new version, at a small performance
price. You can upgrade from the old implementation to the new in small
steps, gaining performance at each step.
Hans Guijt
>> True, but the goal is to address this for future code.
>
> Only if we can predict the future and choose to address it. In the above company politics situation management knew in advance this politics was happening and decided that getting today's features out the door was more important than mitigating the cost of ABI, and so they intentionally did not tell engineering about the politics until it was too late to make changes. As an engineer of course I call this a bad decision, but those managers call it the right decision - who is right is not a knowable exercise.
I've had to deal with this a few times. In all cases the problem wasn't
that someone had conciously chosen to optimize for delivery over ABI,
but rather that they were completely unaware that there was an ABI issue
to begin with. To them, std::string was a vocabulary type, and the idea
that it wouldn't be compatible _with itself_ was just something they
never expected.
> The important take away is if [language/tools/god/whatever] doesn't force you to do the right thing in advance someone will get it wrong and we are back where we are today.
This is why I added the concept of public interfaces. They make it easy
to do the right thing (and come with alluring bonus of possibly better
performance), and they won't allow instable types for their parameters.
The current situation is that the Standard never said that classes would
be stable, but it also never said that classes would not be stable, so
people have an excuse for using them in public interfaces. With a formal
definition, there is no more excuse: if you use an unstable class in a
public interface, that's UB, and the consequences are on you. Neither
the Standard, nor the compiler implementers, will bend over backwards to
accomodate you.
>> The point of having std::stable classes is that they act as a common
>> interface. They are intended for use on public interfaces, not for use
>> throughout your software. You convert between your internal std::string
>> and std::stable::string whenever you pass through the interface.
>
> isn't that what char * and (struct*, size_t length) in extern "C" is? A standard abi for interfaces. However as above the hard part is realizing in advance where the interfaces are. I wouldn't want to make all functions calls go through C ABI just in case, that would perform badly and we lose all the nice things we get from C++ containers.
It currently is, yes. For C++ it would be great if we could at least
pass proper strings and vectors through public interfaces, instead of
always relying on pointers, sizes, manual deallocation, etc.
I can't immediately recall running into a situation where there was
confusion about what the API of a library was. Usually the public API
has its own headers, and is significantly better documented than any
private functions are.
Note that the proposal changes nothing for your internal libraries that
you compile as part of your system. Those aren't public, and thus don't
need to be designed with stability concerns in mind.
>> So if some mythical future CPU works better if strings are implemented
>> as linked lists, you can just change std::string to use linked lists,
>> and if you recompile one of your libraries it won't immediately be
>> incompatible with all your other software, because in its public
>> interface it still presents the data as an array of char, same as
>> before. At that point the conversion from std::string (the linked list
>> version in the library) to std::stable::string will be much more
>> expensive, but at least it will still work.
>
> But I want to use the linked-list string implementation where possible on this platform. Performance matters and I don't want to pay for the costs to convert to the standard ABI and back just because somebody might in the future change the ABI of string or vector. We have a history of the C++ library rarely changing ABI, so I'd even call that unlikely future - but unlikely does not mean never and so as someone who cares about ABI I should do this anyway.
That is a rather far out hypothetical, isn't it? The definition of
strings or vectors is not going to change. We might get new containers,
and those might even one day supplant strings and vectors, but they'll
have different names.
But we might of course want to change std::string. What would you rather
have: the current situation, where std::string is set in stone, but
transfer through public interfaces is cheap, or a potential future where
std::string has 20% more performance, but you pay a small cost when
passing through a public interface?
gcc's std::string currently spends 16 bytes on the character pointer and
the size, and another 16 bytes on the SSO buffer. Why is it storing the
character pointer at all when it is in SSO mode? Why does it need 8
bytes for the size in SSO mode? At the same memory cost, it could have
up to 31 characters in its SSO buffer, using the last byte in the buffer
to act as both the size, the terminating zero, _and_ the indication of
whether the buffer is an SSO buffer, or a pointer to the heap-allocated
buffer!
I tested with my own string class (many years ago) and found that
increasing the size of the SSO buffer from 16 to 32 significantly
increased performance of the entire application. But can gcc do this?
Nope, they will never change the ABI of std::string again, even though
quite a bit of performance is left on the table.
> Which is to say I want some sort of MAGIC so that if both sides do use the same ABI they don't go through the optimal API because I don't want to pay for what I'm not using.
That would be great, but I don't see it happening.
>> If the cost of conversion becomes too high to bear a new stable type can
>> be introduced at that time.
>
> Nothing stops us from introducing std2:: or whatever bikeshed name you want to give to the replacement type. ABI is hard because everyone needs to switch to that new type though - some want to switch everything instantly while others have what to them is good reason not to.
There's no need for everyone to switch at the same time:
std::stable::string can coexist with std2::stable::string, and convert
back and forth. That allows coexistence of libraries that use the old
version with libraries that use the new version, at a small performance
price. You can upgrade from the old implementation to the new in small
steps, gaining performance at each step.
Hans Guijt
Received on 2024-07-13 10:57:30