Date: Tue, 22 Oct 2024 08:32:04 -0700
On Tuesday 22 October 2024 02:49:20 Pacific Daylight Time Federico Kircheis via
Std-Discussion wrote:
> My example has const... and you have been claiming that it was UB.
> So I assume that it was an oversight.
const on non-trivial types is different from const on trivial ones.
> (By the way; primitives have a lifetime too, starting it and ending it
> more than once is UB too, so my example with int should be as good as
> with string)
The problem is not the lifetime, it's how many live copies of it exist. With
the const, the primitive variable becomes static, in which case each copy is
local to the TU in question and you can have as many different definitions as
you may want.
ODR violations only appear if the variable has global linkage.
> >> So, what was the problem?
> >> According to you, not the fact that I used libraries...
> >
> > It's the fact that you linked lib0.cpp twice into your executable, by way
> > of the dynamic linking.
>
> Dynamic linking could work differently and not cause the issue.
> The standard does not say that if I use dynamic linking then I need to
> have lib0.cpp twice.
Dynamic linking could work differently but doesn't, the same way that the Sun
could be powered by gravitational collapse instead of nuclear fusion, but
isn't. The whole point of this portion of the thread is that dynamic linking
works in a way that each shared library is basically "another TU" (or set of
TUs) for the purposes of C++ and all of the Standard language works without
fault. You've been pointing to examples where you think it would disprove this
thesis and I'm replied that your examples are ill-formed even in the absence
of any libraries, so their presence does not change the situation.
> > It's the combination of the code and how you compile/link it. Each TU
> > alone is fine and has no UB. But the combination of them into the final
> > executable is ill-formed.
> >
> > Does your CMakeLists.txt count as "your code"?
>
> Not necessarily (I might be the author of lib0 and author of lib1 is
> someone else), but that's not important; it does not count as C++ code.
Then this falls through the cracks that neither library author is "at fault"
and the executable author doesn't know the details of how each library is
implemented. No one is at fault, but the code is faulty.
Collective responsibility implies no one owns the problem.
However, as the interested party and the one that has a problem, the
executable author has an interest in solving this.
> > In order for any of it to happen, you must have used some non-standard
> > feature, such as an __attribute__.
> And since using dlopen is not OK, I showed some examples of (IMHO)
> normal looking and valid C++ code that can be problematic when used in
> shared libraries.
dlopen() is still fine, even if you use RTLD_LOCAL. This creates a situation
that is more difficult to explain in that you have multiple, distinct views of
"global" but no less valid. It violates the C++ Standard no less.
> The whole C++ code of lib0, lib1, lib2 and main has nothing strange, and
> is not using any non-standard feature.
> By that I mean that
>
> * it is not using any low-level technique, like manually changing
> lifetimes, manually calling constructors, casts, ...
> * it is not using compiler extensions (attributes, sections, ...)
> * it is not using non-standard functions (dlopen, ...)
>
> This code, unmodified, works in a given way when compiled as
> application, but changes it's behavior when moved to libraries.
And I've pointed out it does NOT work when compiled as an application if you
compile it in the way that library linking would: the issue is that you're
duplicating lib0.cpp in your executable. This is the thesis here: that
libraries are "just another TU" and all the effects of it apply. And that
includes the ill effects of ODR violations.
When you showed examples of linking to the application, you _solved_ the
problem by linking lib0.cpp only once. You can also solve the same problem in
the case of libraries, but you're not doing it. The problem is not C++, it's
how you're compiling and in particular linking your application.
> Of course it is the combination of how I compile/link it (especially
> since the code did not change), which is not defined by the standard.
Right.
> If I do not use dynamic libraries then the code of lib0,lib1,lib2,main
> works as expected.
> If dynamic libraries where "something else", then the second and third
> example could work without UB too.
"If I solve the problem, the problem is solved'. Circular reasoning.
> So my point still is:
>
> In the first example, I define multiple globals.
> When using libraries, I get 4 instead of 3 without changing any code.
"If I violate ODR, the program becomes ill-formed". Right
> In the other examples, I define a global only once.
> There is only one instance and it is initialized correctly.
> When using libraries, it is initialized multiple times (when using gcc
> and clang, not msvc).
MSVC doesn't count because no one is claiming that DLLs on Windows operate
"just like other TUs". This discussion is exclusively about how shared
libraries work on (modern) Unix systems.
> I "just" packaged my code in libraries and defined dependencies.
You linked lib0.cpp twice into your executable at runtime. That's an ODR
violation.
Either you accept that dynamic linking does not change the standard or you
consider it "completely out of the standard". In the latter case, we don't
need to discuss anything here, because it would be completely out-of-scope and
"anything could happen".
Therefore, the continued discussion in this list implies that we accept that
dynamic linking behaves in a way compliant with the C++ rules. If you reply to
this email, I will therefore assume you agree with this and that the process
of static and dynamic linking can be understood from the point of view of the
C++ Standard.
> Would I have copied the source code of lib0.cpp, and linked it multiple
> times to main, then I would have admitted that it looks strange and
> possibly problematic.
The fact you did that in an "underhanded", indirect fashion does not make it
different. You did it.
> But in this case, this "copy" of lib0 is both forced and hidden.
Irrelevant.
> I'm not claiming that it is not possible to write c++ code in libraries.
> I'm claiming that
>
> * some code constructs are problematic
> * there are limitations not described by the standard
I agree on the first, in general, but mostly it's "ill-advised" instead of
"broken".
I disagree on the second. There are no limitations to standards-compliant code
not described in the standard, if you accept that shared libraries are "just
another TU". Though this most ignores the fact that linking is barely
described.
> > Stop. The variable *can* be reached outside of the TU because they
> > "forgot"
> > the static keyword. Therefore, it is a global symbol and they are claiming
> > that as part of their ABI. They claim it exclusively: no other library can
> > define the same symbol.
>
> ABI is outside of standard.
ODR violations in the standard. The standard says only one TU can define a
definable item. Each of lib1 and lib2 has chosen (albeit unconsciously), so
they cant be both linked into the same executable.
> >> If you use extern, you cannot use static.
> >
> > Yes, if you want it as a global symbol, you use extern.
>
> And that leads to problem when lib0 is linked in lib1 and lib2
> (constructor called more than once)
Yes.
> > https://www.akkadia.org/drepper/dsohowto.pdf
> >
> > Ulrich Drepper is a former maintainer of glibc, so you should trust him in
> > his expertise. But by necessity this paper is focused on C.
>
> I skimmed the document, it does not discuss the issue of
> "vendoring"/"bundling" and what are possible workarounds; am I wrong?
No, it doesn't. It describes libraries. Uli didn't care who wrote the code
that goes into the library and it really doesn't matter. The point is that the
authors of lib1 and lib2 must adhere to the document, no matter who wrote the
code that goes into their libraries. This includes header-only libraries too.
> Maybe because this issue is more C++ specific?
No, it applies to C too, just in a much more limited fashion because they
don't have some of the causes of the problem. But it could happen with:
lib1:
char *answer_of_life = NULL;
lib2:
typedef struct
{
size_t size;
char *ptr;
} String;
String answer_of_life = {};
> > Stop vendoring: instead, just build the third-party library using their
> > own
> > build system and install to your target build environment. And never link
> > a
> > dynamic library to a static library (unless that's a "convenience library"
> > that is also part of your project).
>
> lib0 provide the library as source code or as static library.
> lib1 and lib2 wants to provide something precompiled (might even be
> close sourced)
Those are excuses. What I said still applies: do not link a dynamic library to
a static library. I don't care if lib1 and lib2 are closed source: they shall
not include a copy of lib0 inside or they will use techniques not in the
Standard to hide the copy from the dynamic symbol table. This is required for
a quality library.
Failing to do so implies their libraries do not meet "quality". That's not a
surprise with closed-source software.
> > Once per process address space.
>
> It sounds like an issue, especially since most c++ code has the c++
> standard library as dependency.
Which is loaded once per address space because it's a dynamic library. Even if
you load both libstdc++ and libc++, it works because libc++ namespaces itself
so all its symbols are different from libstdc++'s.
Virtually all libraries and applications use the C++ Standard Libraries
properly, following my rules: they built the libraries using their own build
systems and they link dynamically to them.
> And isn't saying I should not do it because it is problematic, the same
> as me saying "if I do not use shared libraries" or "if I do not use some
> c++ constructs"?
> Why the different outcome?
Because shared libraries themselves are not the problem. The point of this
sub-thread is that Unix dynamic library linking is "just another TU" from the
point of view of the C++ standard. You can use all C++ constructs the same way
as if you weren't using libraries, provided you accept that all content in
other libraries are "just another TU".
The problem is that people do things with shared libraries that they wouldn't
if libraries weren't present.
> Personally, I find it easier to ensure that no UB happens even with
> "vendoring"/"bundling" than to have to inspect both the sources and
> build systems (which might not be under the control of the authors of
> lib0), or how the library is used.
I've said this before: the fact that you shouldn't *just* use Standard C++ for
libraries is a point. You MUST use hidden visibility and the Standard won't
help you there. Therefore, you have to step outside of the Standard.
Std-Discussion wrote:
> My example has const... and you have been claiming that it was UB.
> So I assume that it was an oversight.
const on non-trivial types is different from const on trivial ones.
> (By the way; primitives have a lifetime too, starting it and ending it
> more than once is UB too, so my example with int should be as good as
> with string)
The problem is not the lifetime, it's how many live copies of it exist. With
the const, the primitive variable becomes static, in which case each copy is
local to the TU in question and you can have as many different definitions as
you may want.
ODR violations only appear if the variable has global linkage.
> >> So, what was the problem?
> >> According to you, not the fact that I used libraries...
> >
> > It's the fact that you linked lib0.cpp twice into your executable, by way
> > of the dynamic linking.
>
> Dynamic linking could work differently and not cause the issue.
> The standard does not say that if I use dynamic linking then I need to
> have lib0.cpp twice.
Dynamic linking could work differently but doesn't, the same way that the Sun
could be powered by gravitational collapse instead of nuclear fusion, but
isn't. The whole point of this portion of the thread is that dynamic linking
works in a way that each shared library is basically "another TU" (or set of
TUs) for the purposes of C++ and all of the Standard language works without
fault. You've been pointing to examples where you think it would disprove this
thesis and I'm replied that your examples are ill-formed even in the absence
of any libraries, so their presence does not change the situation.
> > It's the combination of the code and how you compile/link it. Each TU
> > alone is fine and has no UB. But the combination of them into the final
> > executable is ill-formed.
> >
> > Does your CMakeLists.txt count as "your code"?
>
> Not necessarily (I might be the author of lib0 and author of lib1 is
> someone else), but that's not important; it does not count as C++ code.
Then this falls through the cracks that neither library author is "at fault"
and the executable author doesn't know the details of how each library is
implemented. No one is at fault, but the code is faulty.
Collective responsibility implies no one owns the problem.
However, as the interested party and the one that has a problem, the
executable author has an interest in solving this.
> > In order for any of it to happen, you must have used some non-standard
> > feature, such as an __attribute__.
> And since using dlopen is not OK, I showed some examples of (IMHO)
> normal looking and valid C++ code that can be problematic when used in
> shared libraries.
dlopen() is still fine, even if you use RTLD_LOCAL. This creates a situation
that is more difficult to explain in that you have multiple, distinct views of
"global" but no less valid. It violates the C++ Standard no less.
> The whole C++ code of lib0, lib1, lib2 and main has nothing strange, and
> is not using any non-standard feature.
> By that I mean that
>
> * it is not using any low-level technique, like manually changing
> lifetimes, manually calling constructors, casts, ...
> * it is not using compiler extensions (attributes, sections, ...)
> * it is not using non-standard functions (dlopen, ...)
>
> This code, unmodified, works in a given way when compiled as
> application, but changes it's behavior when moved to libraries.
And I've pointed out it does NOT work when compiled as an application if you
compile it in the way that library linking would: the issue is that you're
duplicating lib0.cpp in your executable. This is the thesis here: that
libraries are "just another TU" and all the effects of it apply. And that
includes the ill effects of ODR violations.
When you showed examples of linking to the application, you _solved_ the
problem by linking lib0.cpp only once. You can also solve the same problem in
the case of libraries, but you're not doing it. The problem is not C++, it's
how you're compiling and in particular linking your application.
> Of course it is the combination of how I compile/link it (especially
> since the code did not change), which is not defined by the standard.
Right.
> If I do not use dynamic libraries then the code of lib0,lib1,lib2,main
> works as expected.
> If dynamic libraries where "something else", then the second and third
> example could work without UB too.
"If I solve the problem, the problem is solved'. Circular reasoning.
> So my point still is:
>
> In the first example, I define multiple globals.
> When using libraries, I get 4 instead of 3 without changing any code.
"If I violate ODR, the program becomes ill-formed". Right
> In the other examples, I define a global only once.
> There is only one instance and it is initialized correctly.
> When using libraries, it is initialized multiple times (when using gcc
> and clang, not msvc).
MSVC doesn't count because no one is claiming that DLLs on Windows operate
"just like other TUs". This discussion is exclusively about how shared
libraries work on (modern) Unix systems.
> I "just" packaged my code in libraries and defined dependencies.
You linked lib0.cpp twice into your executable at runtime. That's an ODR
violation.
Either you accept that dynamic linking does not change the standard or you
consider it "completely out of the standard". In the latter case, we don't
need to discuss anything here, because it would be completely out-of-scope and
"anything could happen".
Therefore, the continued discussion in this list implies that we accept that
dynamic linking behaves in a way compliant with the C++ rules. If you reply to
this email, I will therefore assume you agree with this and that the process
of static and dynamic linking can be understood from the point of view of the
C++ Standard.
> Would I have copied the source code of lib0.cpp, and linked it multiple
> times to main, then I would have admitted that it looks strange and
> possibly problematic.
The fact you did that in an "underhanded", indirect fashion does not make it
different. You did it.
> But in this case, this "copy" of lib0 is both forced and hidden.
Irrelevant.
> I'm not claiming that it is not possible to write c++ code in libraries.
> I'm claiming that
>
> * some code constructs are problematic
> * there are limitations not described by the standard
I agree on the first, in general, but mostly it's "ill-advised" instead of
"broken".
I disagree on the second. There are no limitations to standards-compliant code
not described in the standard, if you accept that shared libraries are "just
another TU". Though this most ignores the fact that linking is barely
described.
> > Stop. The variable *can* be reached outside of the TU because they
> > "forgot"
> > the static keyword. Therefore, it is a global symbol and they are claiming
> > that as part of their ABI. They claim it exclusively: no other library can
> > define the same symbol.
>
> ABI is outside of standard.
ODR violations in the standard. The standard says only one TU can define a
definable item. Each of lib1 and lib2 has chosen (albeit unconsciously), so
they cant be both linked into the same executable.
> >> If you use extern, you cannot use static.
> >
> > Yes, if you want it as a global symbol, you use extern.
>
> And that leads to problem when lib0 is linked in lib1 and lib2
> (constructor called more than once)
Yes.
> > https://www.akkadia.org/drepper/dsohowto.pdf
> >
> > Ulrich Drepper is a former maintainer of glibc, so you should trust him in
> > his expertise. But by necessity this paper is focused on C.
>
> I skimmed the document, it does not discuss the issue of
> "vendoring"/"bundling" and what are possible workarounds; am I wrong?
No, it doesn't. It describes libraries. Uli didn't care who wrote the code
that goes into the library and it really doesn't matter. The point is that the
authors of lib1 and lib2 must adhere to the document, no matter who wrote the
code that goes into their libraries. This includes header-only libraries too.
> Maybe because this issue is more C++ specific?
No, it applies to C too, just in a much more limited fashion because they
don't have some of the causes of the problem. But it could happen with:
lib1:
char *answer_of_life = NULL;
lib2:
typedef struct
{
size_t size;
char *ptr;
} String;
String answer_of_life = {};
> > Stop vendoring: instead, just build the third-party library using their
> > own
> > build system and install to your target build environment. And never link
> > a
> > dynamic library to a static library (unless that's a "convenience library"
> > that is also part of your project).
>
> lib0 provide the library as source code or as static library.
> lib1 and lib2 wants to provide something precompiled (might even be
> close sourced)
Those are excuses. What I said still applies: do not link a dynamic library to
a static library. I don't care if lib1 and lib2 are closed source: they shall
not include a copy of lib0 inside or they will use techniques not in the
Standard to hide the copy from the dynamic symbol table. This is required for
a quality library.
Failing to do so implies their libraries do not meet "quality". That's not a
surprise with closed-source software.
> > Once per process address space.
>
> It sounds like an issue, especially since most c++ code has the c++
> standard library as dependency.
Which is loaded once per address space because it's a dynamic library. Even if
you load both libstdc++ and libc++, it works because libc++ namespaces itself
so all its symbols are different from libstdc++'s.
Virtually all libraries and applications use the C++ Standard Libraries
properly, following my rules: they built the libraries using their own build
systems and they link dynamically to them.
> And isn't saying I should not do it because it is problematic, the same
> as me saying "if I do not use shared libraries" or "if I do not use some
> c++ constructs"?
> Why the different outcome?
Because shared libraries themselves are not the problem. The point of this
sub-thread is that Unix dynamic library linking is "just another TU" from the
point of view of the C++ standard. You can use all C++ constructs the same way
as if you weren't using libraries, provided you accept that all content in
other libraries are "just another TU".
The problem is that people do things with shared libraries that they wouldn't
if libraries weren't present.
> Personally, I find it easier to ensure that no UB happens even with
> "vendoring"/"bundling" than to have to inspect both the sources and
> build systems (which might not be under the control of the authors of
> lib0), or how the library is used.
I've said this before: the fact that you shouldn't *just* use Standard C++ for
libraries is a point. You MUST use hidden visibility and the Standard won't
help you there. Therefore, you have to step outside of the Standard.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel DCAI Platform & System Engineering
Received on 2024-10-22 15:32:10