Date: Wed, 12 Jul 2023 17:34:05 +0300
This is a statement one meets in many variations, when trying to understand
C++'s stance on many real world problems. I imagine I'm not the only one
who is deeply uncomfortable with this answer. Even the meaning of this
statement is far from obvious: does it mean that code that is linked to a
shlib is not susceptible to the standard restrictions? Perhaps programs
that link against shared libraries are undefined behavior? (Note that's
close to 100% of real programs, as most link to libc dynamically). Truth
is, the standard doesn't even say something definite like 'undefined
behavior': the abstract machine lives in an imaginary world where shared
libs don't exist, so in a very real sense the c++ standard does not apply
to most real world programs.
I'd like to try and explore whether something can be done about that. I'm
not aware of previous such attempts, would be happy to hear if you are.
---------------------------------------------------
(1) First, a few general descriptive sections should be reviewed and mostly
rephrased, probably introducing a concept like 'linkage-unit':
[lex.phases] : http://eel.is/c++draft/lex.phases#1.9
[lex.separate] http://eel.is/c++draft/lex.separate#2
[basic.link] http://eel.is/c++draft/basic.link
(2) More importantly, I'm aware of 2 clauses that the clash with the
Windows PE model:
(a) [expr.eq] http://eel.is/c++draft/expr.eq#3.2, has this to say about
comparison of function-pointers: "... if the pointers ... both point to the
same function ... they compare equal."
This is not so simple if the function in question is implemented in a
shared library: an actual `call` is made to the current binary's PLT (in
ELF systems) or IAT (in PE systems), so the direct call address is
different across binaries.
In ELF systems an elaborate mechanism promises the comparison succeeds: the
address of a function, even if implemented in a shared lib, is resolved by
the loader and stored as a value in the *executable*'s symbol table (even
if the executable doesn't use it). If the address of the function is taken
anywhere in the process - from code in the executable or any shared library
- it is resolved from the *executable*'s symbol table.
No analogue apparatus is in place for PE, and comparison of pointers to the
same function taken from code in different binaries will not return equal.
(b) [replacement.functions] http://eel.is/c++draft/replacement.functions#3,
lists flavours of new/delete that can be interposed (==overridden) from
user code:
"The program's definitions are used instead of the default versions
supplied by the implementation.".
This does happen in ELF systems, where (unless you build with -Bsymbolic)
the loader searches the executable first while trying to resolve all
symbols, including new/delete.
This doesn't happen in PE DLLs, where the .idata section explicitly says
from which DLL to import each function. So for example `new` will by
default be imported from the Windows VC-Runtime Dlls, can also be resolved
from a static library linked to the DLL - but *NOT* from the main
executable.
To bridge these gaps, either Microsoft's PE design needs to change or the
standard needs to take a pragmatic approach and relax these clauses. The
PE design cannot change (the world will quite literally break), so the only
way I can see forward is to relax the standard's demands to have them
applicable to all real world programs on real world operating systems. For
example: "if two pointers both point to the same function *it is
implementation defined whether* they compare equal".
---------------------------------------------------
Would a paper investigating along these lines be of interest to the
community? Any other opinions and additions are very welcome.
C++'s stance on many real world problems. I imagine I'm not the only one
who is deeply uncomfortable with this answer. Even the meaning of this
statement is far from obvious: does it mean that code that is linked to a
shlib is not susceptible to the standard restrictions? Perhaps programs
that link against shared libraries are undefined behavior? (Note that's
close to 100% of real programs, as most link to libc dynamically). Truth
is, the standard doesn't even say something definite like 'undefined
behavior': the abstract machine lives in an imaginary world where shared
libs don't exist, so in a very real sense the c++ standard does not apply
to most real world programs.
I'd like to try and explore whether something can be done about that. I'm
not aware of previous such attempts, would be happy to hear if you are.
---------------------------------------------------
(1) First, a few general descriptive sections should be reviewed and mostly
rephrased, probably introducing a concept like 'linkage-unit':
[lex.phases] : http://eel.is/c++draft/lex.phases#1.9
[lex.separate] http://eel.is/c++draft/lex.separate#2
[basic.link] http://eel.is/c++draft/basic.link
(2) More importantly, I'm aware of 2 clauses that the clash with the
Windows PE model:
(a) [expr.eq] http://eel.is/c++draft/expr.eq#3.2, has this to say about
comparison of function-pointers: "... if the pointers ... both point to the
same function ... they compare equal."
This is not so simple if the function in question is implemented in a
shared library: an actual `call` is made to the current binary's PLT (in
ELF systems) or IAT (in PE systems), so the direct call address is
different across binaries.
In ELF systems an elaborate mechanism promises the comparison succeeds: the
address of a function, even if implemented in a shared lib, is resolved by
the loader and stored as a value in the *executable*'s symbol table (even
if the executable doesn't use it). If the address of the function is taken
anywhere in the process - from code in the executable or any shared library
- it is resolved from the *executable*'s symbol table.
No analogue apparatus is in place for PE, and comparison of pointers to the
same function taken from code in different binaries will not return equal.
(b) [replacement.functions] http://eel.is/c++draft/replacement.functions#3,
lists flavours of new/delete that can be interposed (==overridden) from
user code:
"The program's definitions are used instead of the default versions
supplied by the implementation.".
This does happen in ELF systems, where (unless you build with -Bsymbolic)
the loader searches the executable first while trying to resolve all
symbols, including new/delete.
This doesn't happen in PE DLLs, where the .idata section explicitly says
from which DLL to import each function. So for example `new` will by
default be imported from the Windows VC-Runtime Dlls, can also be resolved
from a static library linked to the DLL - but *NOT* from the main
executable.
To bridge these gaps, either Microsoft's PE design needs to change or the
standard needs to take a pragmatic approach and relax these clauses. The
PE design cannot change (the world will quite literally break), so the only
way I can see forward is to relax the standard's demands to have them
applicable to all real world programs on real world operating systems. For
example: "if two pointers both point to the same function *it is
implementation defined whether* they compare equal".
---------------------------------------------------
Would a paper investigating along these lines be of interest to the
community? Any other opinions and additions are very welcome.
Received on 2023-07-12 14:34:18