C++ Logo

std-discussion

Advanced search

Re: Global array of objects over multiple files

From: Federico Kircheis <federico_at_[hidden]>
Date: Tue, 22 Oct 2024 20:54:18 +0200
On 22/10/2024 17.32, Thiago Macieira via Std-Discussion wrote:
> On Tuesday 22 October 2024 02:49:20 Pacific Daylight Time Federico Kircheis via
> Std-Discussion wrote:
>> My example has const... and you have been claiming that it was UB.
>> So I assume that it was an oversight.
>
> const on non-trivial types is different from const on trivial ones.
>
>> (By the way; primitives have a lifetime too, starting it and ending it
>> more than once is UB too, so my example with int should be as good as
>> with string)
>
> The problem is not the lifetime, it's how many live copies of it exist. With
> the const, the primitive variable becomes static, in which case each copy is
> local to the TU in question and you can have as many different definitions as
> you may want.
>
> ODR violations only appear if the variable has global linkage.

What is global linkage?
I know only external, internal and weak.
const in the first example implies internal linkage, so why should it be
an ODR violation?

>>>> So, what was the problem?
>>>> According to you, not the fact that I used libraries...
>>>
>>> It's the fact that you linked lib0.cpp twice into your executable, by way
>>> of the dynamic linking.
>>
>> Dynamic linking could work differently and not cause the issue.
>> The standard does not say that if I use dynamic linking then I need to
>> have lib0.cpp twice.
>
> Dynamic linking could work differently but doesn't, the same way that the Sun
> could be powered by gravitational collapse instead of nuclear fusion, but
> isn't.

I'm going to ignore this part because I have no idea what it means.
dynamic linking, for example for windows executables, works differently.

>> The whole C++ code of lib0, lib1, lib2 and main has nothing strange, and
>> is not using any non-standard feature.
>> By that I mean that
>>
>> * it is not using any low-level technique, like manually changing
>> lifetimes, manually calling constructors, casts, ...
>> * it is not using compiler extensions (attributes, sections, ...)
>> * it is not using non-standard functions (dlopen, ...)
>>
>> This code, unmodified, works in a given way when compiled as
>> application, but changes it's behavior when moved to libraries.
>
> And I've pointed out it does NOT work when compiled as an application if you
> compile it in the way that library linking would: the issue is that you're
> duplicating lib0.cpp in your executable. This is the thesis here: that
> libraries are "just another TU" and all the effects of it apply. And that
> includes the ill effects of ODR violations.

I (the programmer) am not duplicating lib0.cpp !

A translation unit is a source file after preprocessing.
I did not create two translation units with the same code.

The toolchain is duplicating it for making it possible to create the two
shared libraries.
The standard does not say it has to behave that way.

There are implications that I'm not sure about, but using weak linking
would make the second and third example work "as if I compiled
everything together in one executable".
I do not want to say that this is how dynamic libraries should work, but
it is a possible alternate behavior for dynamic libraries on GNU/Linux
systems that is available today.

> When you showed examples of linking to the application, you _solved_ the
> problem by linking lib0.cpp only once. You can also solve the same problem in
> the case of libraries, but you're not doing it. The problem is not C++, it's
> how you're compiling and in particular linking your application.

Where did I say the problem is in C++?
I wrote multiple times it is outside of C++.
It seems to me that we are talking past each other...

>> Of course it is the combination of how I compile/link it (especially
>> since the code did not change), which is not defined by the standard.
>
> Right.
>
>> If I do not use dynamic libraries then the code of lib0,lib1,lib2,main
>> works as expected.
>> If dynamic libraries where "something else", then the second and third
>> example could work without UB too.
>
> "If I solve the problem, the problem is solved'. Circular reasoning.

Huh?

>> So my point still is:
>>
>> In the first example, I define multiple globals.
>> When using libraries, I get 4 instead of 3 without changing any code.
>
> "If I violate ODR, the program becomes ill-formed". Right

There is no ODR violation.
See the mail of Jens and Lauri.
The globals are const in different translation units.
They define different objects at difference places with internal linkage.
Since you insist there is an ODR violation, can you show me according to
what rule from the standard I'm breaking?

>> In the other examples, I define a global only once.
>> There is only one instance and it is initialized correctly.
>> When using libraries, it is initialized multiple times (when using gcc
>> and clang, not msvc).
>
> MSVC doesn't count because no one is claiming that DLLs on Windows operate
> "just like other TUs". This discussion is exclusively about how shared
> libraries work on (modern) Unix systems.

This discussion was about shared libraries in general, and I've shown
examples on Linux.
MSVC behaves differently, in particular for the example provided it does
not cause UB.
Why should it be dismissed?
Because it does not "operate like other TU"?
AFAIK it is not mandated in the C++ standard.

Also TU are a property of compile and link-time, not runtime (happy to
be proven wrong).

>> I "just" packaged my code in libraries and defined dependencies.
>
> You linked lib0.cpp twice into your executable at runtime. That's an ODR
> violation.
>
> Either you accept that dynamic linking does not change the standard or you
> consider it "completely out of the standard". In the latter case, we don't
> need to discuss anything here, because it would be completely out-of-scope and
> "anything could happen".

I consider dynamic linking out of the standard, I think I already wrote
it more than once.
Yes, I think many things can happen, but not anything.

> Therefore, the continued discussion in this list implies that we accept that
> dynamic linking behaves in a way compliant with the C++ rules. If you reply to
> this email, I will therefore assume you agree with this and that the process
> of static and dynamic linking can be understood from the point of view of the
> C++ Standard.
>
>> Would I have copied the source code of lib0.cpp, and linked it multiple
>> times to main, then I would have admitted that it looks strange and
>> possibly problematic.
>
> The fact you did that in an "underhanded", indirect fashion does not make it
> different. You did it.
>
>> But in this case, this "copy" of lib0 is both forced and hidden.
>
> Irrelevant.
>
>> I'm not claiming that it is not possible to write c++ code in libraries.
>> I'm claiming that
>>
>> * some code constructs are problematic
>> * there are limitations not described by the standard
>
> I agree on the first, in general, but mostly it's "ill-advised" instead of
> "broken".
>
> I disagree on the second. There are no limitations to standards-compliant code
> not described in the standard, if you accept that shared libraries are "just
> another TU". Though this most ignores the fact that linking is barely
> described.
>
>>> Stop. The variable *can* be reached outside of the TU because they
>>> "forgot"
>>> the static keyword. Therefore, it is a global symbol and they are claiming
>>> that as part of their ABI. They claim it exclusively: no other library can
>>> define the same symbol.
>>
>> ABI is outside of standard.
>
> ODR violations in the standard. The standard says only one TU can define a
> definable item. Each of lib1 and lib2 has chosen (albeit unconsciously), so
> they cant be both linked into the same executable.
>
>>>> If you use extern, you cannot use static.
>>>
>>> Yes, if you want it as a global symbol, you use extern.
>>
>> And that leads to problem when lib0 is linked in lib1 and lib2
>> (constructor called more than once)
>
> Yes.
>>> https://www.akkadia.org/drepper/dsohowto.pdf
>>>
>>> Ulrich Drepper is a former maintainer of glibc, so you should trust him in
>>> his expertise. But by necessity this paper is focused on C.
>>
>> I skimmed the document, it does not discuss the issue of
>> "vendoring"/"bundling" and what are possible workarounds; am I wrong?
>
> No, it doesn't. It describes libraries. Uli didn't care who wrote the code
> that goes into the library and it really doesn't matter. The point is that the
> authors of lib1 and lib2 must adhere to the document, no matter who wrote the
> code that goes into their libraries. This includes header-only libraries too.
>
>> Maybe because this issue is more C++ specific?
>
> No, it applies to C too, just in a much more limited fashion because they
> don't have some of the causes of the problem. But it could happen with:
>
> lib1:
> char *answer_of_life = NULL;
>
> lib2:
> typedef struct
> {
> size_t size;
> char *ptr;
> } String;
> String answer_of_life = {};

I was writing about "vendoring"/"bundling", not this issue.
Since C has no way to execute code before main is entered, and has no
way to execute code when a dynamic library is loaded, it does not have
the same issue with vendoring and global variables that C++ has.

>>> Stop vendoring: instead, just build the third-party library using their
>>> own
>>> build system and install to your target build environment. And never link
>>> a
>>> dynamic library to a static library (unless that's a "convenience library"
>>> that is also part of your project).
>>
>> lib0 provide the library as source code or as static library.
>> lib1 and lib2 wants to provide something precompiled (might even be
>> close sourced)
>
> Those are excuses.

I believe that delivering a precompiled shared library is a valid and
important use-case, not just an excuse.
I do not think, for example, you can install multiple versions of the
C++ standard library on a Linux system and have it work "out-of-the-box".

> What I said still applies: do not link a dynamic library to
> a static library. I don't care if lib1 and lib2 are closed source: they shall
> not include a copy of lib0 inside or they will use techniques not in the
> Standard to hide the copy from the dynamic symbol table. This is required for
> a quality library.

It does not sound like a possible thing to provide a precompiled shared
library without vendoring.

> Failing to do so implies their libraries do not meet "quality". That's not a
> surprise with closed-source software.
>
>>> Once per process address space.
>>
>> It sounds like an issue, especially since most c++ code has the c++
>> standard library as dependency.
>
> Which is loaded once per address space because it's a dynamic library. Even if
> you load both libstdc++ and libc++, it works because libc++ namespaces itself
> so all its symbols are different from libstdc++'s.

That is good to know, but the problem would be two libraries linked
against libc++ (or two linked against libstdc++)

> Virtually all libraries and applications use the C++ Standard Libraries
> properly, following my rules: they built the libraries using their own build
> systems and they link dynamically to them.

We have apparently different experiences (not with the standard library,
but with other libraries).
I'll just note that I've never seen this rule or guideline about
"vendoring"/"bundling" written anywhere.
If you have a source, I'll read it gladly.

>> And isn't saying I should not do it because it is problematic, the same
>> as me saying "if I do not use shared libraries" or "if I do not use some
>> c++ constructs"?
>> Why the different outcome?
>
> Because shared libraries themselves are not the problem. The point of this
> sub-thread is that Unix dynamic library linking is "just another TU" from the
> point of view of the C++ standard. You can use all C++ constructs the same way
> as if you weren't using libraries, provided you accept that all content in
> other libraries are "just another TU".

The last sentence seems like a contradiction to me.
If I "accept that all content in other libraries are "just another TU"",
then I cannot use some constructs; for example having lib1 and lib2
(dynamic) depending on lib0(static) with one extern global variable.


Is this "operate like other TU" documented somewhere?
It makes little sense to gather those information by mail.

> The problem is that people do things with shared libraries that they wouldn't
> if libraries weren't present.
>
>> Personally, I find it easier to ensure that no UB happens even with
>> "vendoring"/"bundling" than to have to inspect both the sources and
>> build systems (which might not be under the control of the authors of
>> lib0), or how the library is used.
>
> I've said this before: the fact that you shouldn't *just* use Standard C++ for
> libraries is a point. You MUST use hidden visibility and the Standard won't
> help you there. Therefore, you have to step outside of the Standard.


Federico

Received on 2024-10-22 18:54:25