C++ Logo

std-discussion

Advanced search

Re: Global array of objects over multiple files

From: Lauri Vasama <wg21_at_[hidden]>
Date: Tue, 22 Oct 2024 13:09:39 +0300
I just want to be clear; are you aware that const implies static on
global variables? Because you declare the global variable "instance"
const, and have placed it in a header, each translation unit that
includes the header gets its own local copy of the "instance" variable
with internal linkage.

On 22/10/2024 12.49, Federico Kircheis via Std-Discussion wrote:
> On 21/10/2024 18.24, Thiago Macieira via Std-Discussion wrote:
>> On Sunday 20 October 2024 22:33:07 Pacific Daylight Time Federico
>> Kircheis via
>> Std-Discussion wrote:
>>>> Indeed. You're quoting to me what can happen under ODR violations.
>>>
>>> Saying to each other "you are wrong" does not really help.
>>> Could you help me find the part of the standard that say that what I
>>> wrote is UB?
>>
>> Nowhere. It's not UB.
>>
>> It's ill-formed. That's not the same thing. Jens' reply has it. See
>> https://eel.is/c++draft/basic.def.odr#15
>>
>> Your examples match the conditions: the variable in question (a
>> definable item)
>> is not an inline function or variable and is defined in multiple
>> translation
>> units without any of the exclusions. That paragraph says your program
>> is ill-
>> formed, no diagnostic required (IFNDR).
>>
>> I've reduced it so you can see what's wrong. ALL your examples are
>> equivalent
>> to this: https://godbolt.org/z/rYnrcToTn
>
> This is not true.
>
> Change "int instance = 42;" to "const int instance = 42;"
>
> At that point the C++ code is equivalent.
>
>>>> The difference here is that int is a primitive, so nothing gets
>>>> emitted in
>>>> the first place, unless its address is taken. In that case, it's
>>>> emitted
>>>> as a global and there is a problem.
>>>
>>> I wrote int to make a minimal example, I should have written
>>> std::string, but it should not make any difference.
>>
>> It DOES make a difference because of the const. See Jens' reply.
>>
>> If you remove it, it reduces to the example I gave above and is IFNDR.
>
> My example has const... and you have been claiming that it was UB.
> So I assume that it was an oversight.
>
> (By the way; primitives have a lifetime too, starting it and ending it
> more than once is UB too, so my example with int should be as good as
> with string)
>
>>>> Yes. But you changed how you compile and how many times you link
>>>> lib0.cpp
>>>> into your executable. If you fix your problem, then the problem is not
>>>> there any more.
>>>
>>> So, what was the problem?
>>> According to you, not the fact that I used libraries...
>>
>> It's the fact that you linked lib0.cpp twice into your executable, by
>> way of
>> the dynamic linking.
>
> Dynamic linking could work differently and not cause the issue.
> The standard does not say that if I use dynamic linking then I need to
> have lib0.cpp twice.
>
>> My example above is not that, but both lib1.cpp and lib2.cpp have a
>> symbol
>> with the same name linked into the final executable, which is IFNDR.
>> Here's an
>> even simpler example showing "the same file" compiled twice:
>> https://godbolt.org/z/vEE4bqq68
>>
>>>> Yes, I am reading this as "if I remove the problematic build
>>>> solution and
>>>> fix the issue". I agree that in this case the problem is solved.
>>>
>>> So, the problem was not in my *code*?
>>
>> It's the combination of the code and how you compile/link it. Each TU
>> alone is
>> fine and has no UB. But the combination of them into the final
>> executable is
>> ill-formed.
>>
>> Does your CMakeLists.txt count as "your code"?
>
> Not necessarily (I might be the author of lib0 and author of lib1 is
> someone else), but that's not important; it does not count as C++ code.
>
> I wrote
>
>> Dynamic kinking already breaks multiple guarantees from the standard;
>> [...], and many things that have to do with global objects [...]
>
> I was talking about C++.
>
> The answer to that statement was
>
>> In order for any of it to happen, you must have used some
>> non-standard feature, such as an __attribute__.
>
> And since using dlopen is not OK, I showed some examples of (IMHO)
> normal looking and valid C++ code that can be problematic when used in
> shared libraries.
>
> The whole C++ code of lib0, lib1, lib2 and main has nothing strange,
> and is not using any non-standard feature.
> By that I mean that
>
> * it is not using any low-level technique, like manually changing
> lifetimes, manually calling constructors, casts, ...
> * it is not using compiler extensions (attributes, sections, ...)
> * it is not using non-standard functions (dlopen, ...)
>
> This code, unmodified, works in a given way when compiled as
> application, but changes it's behavior when moved to libraries.
>
> In the first example, you have three copies of the variable instead of
> four.
> In the others, you have one global variable initialized once instead
> of multiple times (UB).
>
> Of course it is the combination of how I compile/link it (especially
> since the code did not change), which is not defined by the standard.
>
> If I do not use dynamic libraries then the code of lib0,lib1,lib2,main
> works as expected.
> If dynamic libraries where "something else", then the second and third
> example could work without UB too.
>
> So my point still is:
>
> In the first example, I define multiple globals.
> When using libraries, I get 4 instead of 3 without changing any code.
>
> In the other examples, I define a global only once.
> There is only one instance and it is initialized correctly.
> When using libraries, it is initialized multiple times (when using gcc
> and clang, not msvc).
>
>
>
> You might claim I did something very wrong when building the
> application (the bundling you are referring).
>
> Honestly, it does not look like that to me.
>
> ----
> add_library(lib0 STATIC lib0.hpp lib0.cpp)
>
> add_library(lib1 SHARED lib1.hpp lib1.cpp)
> target_link_libraries(lib1 lib0)
>
> add_library(lib2 SHARED lib2.hpp lib2.cpp)
> target_link_libraries(lib2 lib0)
>
> add_executable(main main.cpp)
> target_link_libraries(main lib1 lib2)
> ----
>
>
> I "just" packaged my code in libraries and defined dependencies.
>
> Would I have copied the source code of lib0.cpp, and linked it
> multiple times to main, then I would have admitted that it looks
> strange and possibly problematic.
>
> But in this case, this "copy" of lib0 is both forced and hidden.
>
> I'm not claiming that it is not possible to write c++ code in libraries.
> I'm claiming that
>
> * some code constructs are problematic
> * there are limitations not described by the standard
>
> And just like you wrote that you should avoid this "bundling"/this
> project structure, you can also change the code and make it work in
> such projects/environments.
>
>>>> Agreed. Strictly speaking, it's the combination of lib1 and lib2
>>>> into one
>>>> executable. You can't do that.
>>>
>>> And how does the author of main know?
>>
>> The same way as any other bug: inspect the symptoms, debug it, and
>> find out.
>> This is an IFNDR: *no diagnostic required* on the toolchain, so it is
>> not
>> required to help you. You must find out how your code is ill-formed
>> from the
>> fact it misbehaves and fix it.
>
> I'm not claiming that the toolchain is not helping me...
>
>>>>> You can obviously report a bug to those libraries to hide their
>>>>> symbols
>>>>> with compiler specifics tools, but according to the standard
>>>>> everything
>>>>> is fine.
>>>>
>>>> Sure. They *chose* to have a global symbol here (there's no "static").
>>>> That
>>>> means they are claiming "answer_of_life" as part of their ABI. Two TUs
>>>> can't do that at the same time.
>>>
>>> They choose to have a global variable that is not reachable in any way
>>> in C++ outside of its TU.
>>
>> Stop. The variable *can* be reached outside of the TU because they
>> "forgot"
>> the static keyword. Therefore, it is a global symbol and they are
>> claiming
>> that as part of their ABI. They claim it exclusively: no other
>> library can
>> define the same symbol.
>
> ABI is outside of standard.
>
>>> If you use extern, you cannot use static.
>>
>> Yes, if you want it as a global symbol, you use extern.
>
> And that leads to problem when lib0 is linked in lib1 and lib2
> (constructor called more than once)
>
>>>> If they didn't mean to claim that, then there are solutions to
>>>> avoid the
>>>> collision, like namespaces.
>>>
>>> I left namespace out of brevity, if the author of lib0 used a
>>> namespace,
>>> it wouldn't make any difference.
>>
>> No, it wouldn't make a difference, because it's the same source file
>> twice.
>> Because lib1 and lib2 include lib0 without hiding its symbols, they
>> necessarily include all of lib0's ABI in their own ABI. That means
>> lib1 and
>> lib2 have clashing ABI. Someone has to resolve that.
>>
>> The fact that lib1 and lib2 authors left it up to you is your
>> problem. You
>> chose to use those two libraries with questionable quality. As I
>> said, using
>> hidden visibility is *mandatory* and they failed to hide the symbols
>> that
>> weren't part of their own ABI.
>>
>>>> Indeed. Like I said, Dynamic Shared Objects 101 is mandatory reading.
>>>
>>> Can you provide a link to me?
>>
>> https://www.akkadia.org/drepper/dsohowto.pdf
>>
>> Ulrich Drepper is a former maintainer of glibc, so you should trust
>> him in his
>> expertise. But by necessity this paper is focused on C.
>
>
> I skimmed the document, it does not discuss the issue of
> "vendoring"/"bundling" and what are possible workarounds; am I wrong?
>
> Maybe because this issue is more C++ specific?
>
> Currently, the other issues I have in mind are not caused by
> "vendoring"/"bundling".
>
>>
>> He's also the author of the specification that gave us proper
>> __thread and
>> thread_local variables on Unix systems.
>>
>> Personally, I claim that it's slightly outdated: today, you should use
>> protected visibility for your exported symbols and
>> -mno-direct-extern-access,
>> but that option is still too new for ubiquitous use.
>
> I'll document myself on that flag, I never heard it before.
>
>>>> The problem in your case is the presence of lib0 inside of lib1 and
>>>> lib2.
>>>> Stop propagating this horrible practice of bundling copies of
>>>> third-party
>>>> content.
>>> What do you mean by vendoring an bundling?
>>
>> The copy of a third party source code inside of another is
>> "vendoring". The
>> resulting code is bundled inside of the library.
>>
>> Stop vendoring: instead, just build the third-party library using
>> their own
>> build system and install to your target build environment. And never
>> link a
>> dynamic library to a static library (unless that's a "convenience
>> library"
>> that is also part of your project).
>
> lib0 provide the library as source code or as static library.
> lib1 and lib2 wants to provide something precompiled (might even be
> close sourced)
>
> There are surely other use-cases.
>
>>> The are not precompiled or no not have a separate copy of lib0 in they
>>> sources.
>>
>> There's a bundled copy of lib0 inside of each of lib1 and lib2. That
>> means you
>> CANNOT load both lib1 and lib2 into memory in the same process
>> because that's
>> ill-formed (no diagnostic required).
>>
>>> The libraries have a common dependency.
>>> It is not that uncommon, or should a library used at mostly once
>>> worldwide?
>>
>> Once per process address space.
>
> It sounds like an issue, especially since most c++ code has the c++
> standard library as dependency.
>
>
> And isn't saying I should not do it because it is problematic, the
> same as me saying "if I do not use shared libraries" or "if I do not
> use some c++ constructs"?
> Why the different outcome?
>
>
> Personally, I find it easier to ensure that no UB happens even with
> "vendoring"/"bundling" than to have to inspect both the sources and
> build systems (which might not be under the control of the authors of
> lib0), or how the library is used.

Received on 2024-10-22 10:09:47