C++ Logo

std-discussion

Advanced search

Re: Global array of objects over multiple files

From: J Decker <d3ck0r_at_[hidden]>
Date: Wed, 23 Oct 2024 03:18:39 -0700
Hello; I'm not picking on or replying to anyone specifically... just my
stray thoughts on the topic(s).

There are already keywords like static and extern that have no meaning to
C++, but confer information to linkers to know where they can find the
additional specified symbols. Defining hard lines like 'this is a language
issue' 'this is a linker issue' can't really be done, at the end of the day
we just want to specify what to do and what to use when doing whatever.

This propagates all the way through to the system and the
executable/dynamic link services that actually put stuff in memory; 1)
AFAIK there's no linker that combines sections of separate loadable
objects... they each map into their own spaces, with their own segments....
the system would have to somehow run through all loadable objects and
figure out what the total size of a shared section might be, and then use
that size; but that only works in the first pass, once you throw dynamic
linking in there, now that space has to dynamically expand potentially...
so this has grown from a few files in a library to something much more
complex; static and extern only refer to a local scope of a single loadable
object.

Many linkers at the static layer have an idea of sections, and section
names can be used to group things; but this has been a compiler specific
extension; __attribute__ in one case #pragma's in others....

I have this module in my library that handles deadstart; that is code that
is run like language runtime init, before main... things like DllMain or
start() which are only single entry points but more dynamic. There's a lot
of #ifdef's for various platforms to support the various methods. GCC C++
has __attribute__((constructor,priority)), but C only has __attribute__((
constructor)); when a library grows very large, and you have a dynamic init
method, sometimes it's nice to specify a number instead of a specific order
of object files in a list.... OpenWatcom has a specific structure that
you can put in `__based(__segname("XI"))` that itself has a priority.
 MSVC similarly uses some specifically named sections that you can declare
void (*f)(void); void function pointers that get called with the standard C
runtime startup code...

https://github.com/d3x0r/SACK/blob/master/include/deadstart.h it's pretty
well documented in the header
https://github.com/d3x0r/SACK/tree/master/src/deadstart (list start_end,
_lib and _program have different ends, the end of _lib is just a end of
list and schedules functions to run, _prog schedules its functions then
also runs the whole list of loaded functions; this puts it far enough into
the process that CreateThread on windows doesn't hang, and the thread will
actually run... so many tiny details of borken behavior on various
systems...)

my interface ends up being PRELOAD( UniqueName ) { /* some code to run */
} or PRELOAD_PRIORITY( uniquename, 0-max_uint){} which can be wrapped in
other things, and then you can just have initializers scattered through the
code that add objects to a vector of those objects maybe instead of an
array?

GCC though I end up having to specify a couple object files that are listed
first and last in the list of things to link, which define a known symbol
at the start and a known end so I know how many things go together. This
use of sections could be what is wanted for building an array... (though
it's certainly far from standard, or even common).

It might be nice if there were standard language support for such things,
like I dunno, an extra string after 'extern'... as in 'extern "some-space"
type...' (just spitballing, and maybe it is something even the system might
need to change to support someday properly, if you wanted these arrays to
also support dynamic plugins. )





On Tue, Oct 22, 2024 at 4:01 PM Thiago Macieira via Std-Discussion <
std-discussion_at_[hidden]> wrote:

> On Tuesday 22 October 2024 11:54:18 Pacific Daylight Time Federico
> Kircheis via
> Std-Discussion wrote:
> > > ODR violations only appear if the variable has global linkage.
> >
> > What is global linkage?
> > I know only external, internal and weak.
>
> Imprecise wording on my part. I meant external linkage.
>
> > const in the first example implies internal linkage, so why should it be
> > an ODR violation?
>
> Because, by definition, internal linkage symbols can't have ODR violations.
>
> You can still violate ODR by having the same class or enum defined in two
> different ways, even with no external linkage symbols appearing, but we've
> long
> ignored those IFNDR problems and lived happily with them.
>
> > I'm going to ignore this part because I have no idea what it means.
> > dynamic linking, for example for windows executables, works differently.
>
> And I've already said this discussion is not about Windows. We're
> exclusively
> discussing "shared libraries on Unix behave as if 'just another TU'".
>
> > > And I've pointed out it does NOT work when compiled as an application
> if
> > > you compile it in the way that library linking would: the issue is that
> > > you're duplicating lib0.cpp in your executable. This is the thesis
> here:
> > > that libraries are "just another TU" and all the effects of it apply.
> And
> > > that includes the ill effects of ODR violations.
> >
> > I (the programmer) am not duplicating lib0.cpp !
>
> Yes, you are. You're responsible for the final linking, therefore you're
> responsible. The fact you don't know you are and there's no diagnostic to
> help
> you detect it does not absolve you from the responsibility.
>
> We may argue that the tooling should be improved to detect this and/or the
> Standard language or extensions should be improved to make the situation
> less
> likely and more detectable. I'd welcome those discussions. But that's
> neither
> here nor there. The standard says "ODR violations are IFNDR" and the
> implementation produced no diagnostic, but it's still ill-formed.
>
> > A translation unit is a source file after preprocessing.
> > I did not create two translation units with the same code.
>
> "You" did by compiling and linking everything together. The standard
> doesn't
> care who typed "make" or when, only about the well-formedness of the final
> content. The fact you obtained some content compiled by others does not
> absolve you from the need to observe the One Definition Rule and not
> violate
> it.
>
> > The toolchain is duplicating it for making it possible to create the two
> > shared libraries.
> > The standard does not say it has to behave that way.
>
> No, it doesn't. But the fact is that this is how it behaves and because
> it's
> your responsibility to ensure ODR is not violated, you must know whether
> any
> duplication happened.
>
> > I do not want to say that this is how dynamic libraries should work, but
> > it is a possible alternate behavior for dynamic libraries on GNU/Linux
> > systems that is available today.
>
> And in a different universe, stars are powered by gravitational collapse.
> It's
> an interesting theoretical exercise but irrelevant because it's not the
> universe we live in.
>
> Unlike the universe, we can change dynamic library linking on Unix
> systems.
> But the barrier to doing that is nearly as high. So for all intents and
> purposes, it's immutable law.
>
> > Where did I say the problem is in C++?
> > I wrote multiple times it is outside of C++.
> > It seems to me that we are talking past each other...
>
> I'm arguing against your assertion that "shared libraries break C++". They
> by
> themselves do not and all the problems that we have with them are either
> in
> C++ already or are caused by things outside of C++.
>
> One common argument is that dynamic_cast or exceptions don't work across
> library boundaries. Yes, they do, if you stick to pure C++ and don't apply
> hidden visibility. But you *should* apply hidden visibility.
>
> > >> If I do not use dynamic libraries then the code of lib0,lib1,lib2,main
> > >> works as expected.
> > >> If dynamic libraries where "something else", then the second and third
> > >> example could work without UB too.
> > >
> > > "If I solve the problem, the problem is solved'. Circular reasoning.
> >
> > Huh?
>
> The point is that if you do things that solve the problem, then the
> problem is
> solved and there's nothing to discuss. And specifically, the same solution
> applies whether you're using shared libraries or not. That's what I am
> arguing: the problems you're relating are inside of C++ because you
> violated
> the One Definition Rule.
>
> > >> So my point still is:
> > >>
> > >> In the first example, I define multiple globals.
> > >> When using libraries, I get 4 instead of 3 without changing any code.
> > >
> > > "If I violate ODR, the program becomes ill-formed". Right
> >
> > There is no ODR violation.
>
> I've shown multiple times how you're violating it. You've argued against
> it by
> changing the build system to remove the violation. That's great, it means
> you
> know how to solve the problem. But the fact you *can* solve the problem
> does
> not mean the problem didn't exist in the original case.
>
> > See the mail of Jens and Lauri.
>
> They seem to agree with me.
>
> > The globals are const in different translation units.
>
> const implies internal linkage, as Lauri pointed out. Internally-linked
> symbols don't have ODR violations.
>
> > They define different objects at difference places with internal linkage.
>
> Correct and that's fine.
>
> How hard is it to understand that you are allowed to have to variables of
> different types named the same thing if they are static but not if they
> are
> extern?
>
> > Since you insist there is an ODR violation, can you show me according to
> > what rule from the standard I'm breaking?
>
> I have. https://eel.is/c++draft/basic.def.odr#15
>
> > > MSVC doesn't count because no one is claiming that DLLs on Windows
> operate
> > > "just like other TUs". This discussion is exclusively about how shared
> > > libraries work on (modern) Unix systems.
> >
> > This discussion was about shared libraries in general, and I've shown
> > examples on Linux.
> > MSVC behaves differently, in particular for the example provided it does
> > not cause UB.
> > Why should it be dismissed?
> > Because it does not "operate like other TU"?
> > AFAIK it is not mandated in the C++ standard.
>
> This discussion is exclusively about Unix systems because someone claimed
> they
> are "just another TU" and I agreed, but you didn't. No one is claiming
> Windows
> obeys the same rules.
>
> We could be having a discussion on how to do shared libraries properly
> everywhere, but that's not the discussion we're having. We could be
> discussing
> what Physics mechanism is responsible for making stars bright, but that's
> not
> the discussion we're having. Those things are out of scope for this
> discussion, but we can switch to discussing them if you want.
>
> > Also TU are a property of compile and link-time, not runtime (happy to
> > be proven wrong).
>
> Runtime linking is still linking.
>
> > I consider dynamic linking out of the standard, I think I already wrote
> > it more than once.
>
> Then I don't think we need to discuss anything any more.
>
> BTW, I also suggest you not post anything about your solving problems with
> code compiled by others that you need to patch to make work. Obviously
> that is
> an out-of-standard problem, as the standard only deals with source code
> and
> no-library single-application linking.
>
> > > No, it applies to C too, just in a much more limited fashion because
> they
> > > don't have some of the causes of the problem. But it could happen with:
> > >
> > > lib1:
> > > char *answer_of_life = NULL;
> > >
> > > lib2:
> > > typedef struct
> > > {
> > >
> > > size_t size;
> > > char *ptr;
> > >
> > > } String;
> > > String answer_of_life = {};
> >
> > I was writing about "vendoring"/"bundling", not this issue.
> > Since C has no way to execute code before main is entered, and has no
> > way to execute code when a dynamic library is loaded, it does not have
> > the same issue with vendoring and global variables that C++ has.
>
> It has the very same issue, as exemplified in the code above. The fact
> that
> some content reads or writes to answer_of_life after main() is not the
> point.
> Your problem wasn't the initialisation or order thereof, but the ODR
> violation. The code above has exactly the same violation.
>
> > >> lib0 provide the library as source code or as static library.
> > >> lib1 and lib2 wants to provide something precompiled (might even be
> > >> close sourced)
> > >
> > > Those are excuses.
> >
> > I believe that delivering a precompiled shared library is a valid and
> > important use-case, not just an excuse.
>
> It's an important use-case, but irrelevant to the problem at hand. The
> violation happened, so it's irrelevant how the code was compiled.
>
> Similarly, overriding some memory allocations is a valid use-case, but if
> a
> library overrides the global operator new() in ways that don't work for
> other
> libraries or the main application, it's a problem and it's irrelevant how
> this
> library was compiled.
>
> > I do not think, for example, you can install multiple versions of the
> > C++ standard library on a Linux system and have it work "out-of-the-box".
>
> Yes, you can. libstdc++ and libc++ are designed to be loadable in the same
> process address space and work without stomping over each other's symbols.
> Both of their low-level C++ support libraries (libsupc++ and libc++abi)
> are
> designed to be exactly compatible with each other and interchangeable.
>
> You can't have C++ code linking to both at the same time, but you can
> exchange
> some data via a C glue layer. This allows, for example, a C++ application
> to
> dynamically load a plugin using a different C++ standard library, provided
> the
> application only accesses its C entry function and pass C types to it.
>
> C API libraries can also do it.
>
> > > What I said still applies: do not link a dynamic library to
> > > a static library. I don't care if lib1 and lib2 are closed source: they
> > > shall not include a copy of lib0 inside or they will use techniques not
> > > in the Standard to hide the copy from the dynamic symbol table. This is
> > > required for a quality library.
> >
> > It does not sound like a possible thing to provide a precompiled shared
> > library without vendoring.
>
> They shall do it if they want the label of "quality library". Or, by
> converse,
> those libraries get the label of "poor quality library".
>
> Better yet, don't provide precompiled. Open-source it and let others
> compile.
>
> > > Which is loaded once per address space because it's a dynamic library.
> > > Even if you load both libstdc++ and libc++, it works because libc++
> > > namespaces itself so all its symbols are different from libstdc++'s.
> >
> > That is good to know, but the problem would be two libraries linked
> > against libc++ (or two linked against libstdc++)
>
> Which, as I said above and I quote, "is loaded once per process because
> it's a
> dynamic library".
>
> Unless you linked a dynamic library to libstdc++.a or libc++.a. But that
> violates the rule of "do not link dynamic libraries to static libraries".
>
> > We have apparently different experiences (not with the standard library,
> > but with other libraries).
>
> I'm neither disputing nor even doubting that. I do know the quality of
> closed-
> source libraries and how poor their implementations are. I've had the
> experience with teams who didn't want to open-source their code because
> they
> were ashamed of the quality of it. And to be clear: that's a good team
> that
> recognises they took shortcuts in the name of expediency. Most
> closed-source
> others would be oblivious to this.
>
> Yet ignorance of the fact you've produced low-quality content does not
> raise
> said quality.
>
> > I'll just note that I've never seen this rule or guideline about
> > "vendoring"/"bundling" written anywhere.
> > If you have a source, I'll read it gladly.
>
> More like experience and my personal pet peeve.
>
> Just think about how many projects copied zlib into their source codes for
> expediency sake as they needed to decompress something. How do downstream
> users of said libraries go about fixing the copies of zlib because a new
> security issue has been reported?
>
> Now imagine this was liblzma.
>
> > > Because shared libraries themselves are not the problem. The point of
> this
> > > sub-thread is that Unix dynamic library linking is "just another TU"
> from
> > > the point of view of the C++ standard. You can use all C++ constructs
> the
> > > same way as if you weren't using libraries, provided you accept that
> all
> > > content in other libraries are "just another TU".
> >
> > The last sentence seems like a contradiction to me.
> > If I "accept that all content in other libraries are "just another TU"",
> > then I cannot use some constructs; for example having lib1 and lib2
> > (dynamic) depending on lib0(static) with one extern global variable.
>
> What I said works just fine. You acccept lib1 and lib2 as just other TUs.
> As
> they both have the same "defined item" with external linkage (as per
> [basic.def.odr]), linking to both implies a violation of ODR, making your
> application ill-formed, no diagnostic required.
>
> How they came about that defined item is irrelevant. Only the fact that
> they do
> have that symbol is a problem.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel DCAI Platform & System Engineering
>
>
>
> --
> Std-Discussion mailing list
> Std-Discussion_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion
>

Received on 2024-10-23 10:18:55