C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Providing information about data structures to the compiler

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Sat, 15 Mar 2025 16:35:43 +0100
Very worthwhile!   -----Ursprüngliche Nachricht----- Von:Henning Meyer via Std-Proposals <std-proposals_at_[hidden]> Gesendet:Sa 15.03.2025 16:28 Betreff:Re: [std-proposals] Providing information about data structures to the compiler An:Sebastian Wittmeier via Std-Proposals <std-proposals_at_[hidden]>; CC:Henning Meyer <hmeyer.eu_at_[hidden]>; My focus is on data structures that can be traversed linearly via begin() and end(), like std::vector or std::map (which is a tree internally). These can be nested, like a std::vector<std::set<int>>, in which case I follow them recursively. If you have a very simple tree-like data structure struct Node { std::vector<Node> children; }; that exposes its children as a range, then no special handling is necessary. My hope is that more complicated cases can be handled via zero-cost proxies and views. Because all the work is in the implementation, I decided to start working on GNU/Linux with the Itanium ABI, ELF binaries with DWARF debug information, the clang compiler and libstdc++ standard library and because that means I am in LLVM already, the lld linker and lldb debugger. But the same could be done in the GNU toolchain or on other operating systems. The C++ part (annotating functions in libraries to allow object discovery and type recovery) can be done in an implementation agnostic way. I don't think I can write down a full working spec before trying out an implementation, though. I could write a proposal for the parts that I have figured out so far. I am not interested in interactive code at the moment. I want to do automated analysis of program snapshots for evidence of memory corruption. E.g., you have a linked list and due to a race condition or use-after-free the very last node is corrupted and will lead to undefined behavior when used. This could lie dormant and not cause a crash until the program actually iterates over that list until the end. If a tool is able to recursively follow containers to discover contained objects then we will be able to diagnose these problems (you can determine whether a pointer points to valid memory). I think there is an under-used opportunity beyond compile-time and run-time checks in snapshot analysis/coredump analysis. The necessary infrastructure for that does not exist at the moment, but is possible with "non-virulent" annotations to existing code bases (mostly libraries) and improvements to tooling. It would be useful for other things, including interactive debugging. On 15.03.25 15:44, Sebastian Wittmeier via Std-Proposals wrote: > AW: [std-proposals] Providing information about data structures to the > compiler > > Do you plan to support an interface for advanced data structures like > trees or graphs? > > Or even interactive code? > > You are focusing on ELFs and DWARFs (dwarves?) for now? Or would the > implementation be Unix and the attributes system independent? > >     -----Ursprüngliche Nachricht----- >     *Von:* Henning Meyer via Std-Proposals >     <std-proposals_at_[hidden]> >     *Gesendet:* Sa 15.03.2025 15:11 >     *Betreff:* Re: [std-proposals] Providing information about data >     structures to the compiler >     *An:* Sebastian Wittmeier via Std-Proposals >     <std-proposals_at_[hidden]>; >     *CC:* Henning Meyer <hmeyer.eu_at_[hidden]>; >     Functions and methods just for use by the debugger would be >     eliminated >     completely by the optimizer in production builds. You would need an >     attribute to tell the compiler to keep it around even if it is not >     used. > >     For example, in the case of ELF binaries with DWARF debug >     information, >     you would want the compiler to emit unoptimized, non-inlined >     functions >     for begin(), end(), size(), probably to_string() as well. These >     could go >     into a new, separate .debug_text section that can be stripped from >     binaries. This requires cooperation from libraries ([[debug]] >     annotations), compiler, linker and debugger. I am working on a >     prototype >     implementation. >     If the only purpose is for code to be run in the debugger, they could >     even be emitted in DWARF opcodes instead of machine code, but for >     now I >     am lifting the generated machine code back to DWARF opcodes (for pure >     functions like std::vector begin() and end()) or run them in a VM >     (to_string() methods and functions that might allocate and free). > >     Containers aren't so bad, std::variant and std::any require more >     effort >     (if you want to support them generically, i.e. boost and everyone >     else's >     re-implementation as well). > > >     On 15.03.25 14:45, Sebastian Wittmeier via Std-Proposals wrote: >     > AW: [std-proposals] Providing information about data structures >     to the >     > compiler >     > >     > What one could do besides attributes is to have functions just >     for use >     > by the debugger. >     > >     > Like an integrated debug interface. >     > >     > Anything similar already out there for C++? >     > >     > Other languages: >     > >     > Elixir >     > >     > Inspect protocol >     > >     > https://hexdocs.pm/elixir/Inspect.html >     > >     > Rust: >     > >     > #[derive(Debug)] attribute >     > >     > >     https://doc.rust-lang.org/rust-by-example/hello/print/print_debug.html >     > >     > Julia: >     > >     > overloading Base.show member function >     > >     > >     https://docs.julialang.org/en/v1/manual/types/#man-custom-pretty-printing >     > >     > C# >     > >     > DebuggerDisplay attribute >     > >     > >     https://learn.microsoft.com/en-us/visualstudio/debugger/using-the-debuggerdisplay-attribute?view=vs-2022 >     > >     > Python: >     > >     > __repr__ member function >     > >     > https://docs.python.org/3/library/functions.html#repr >     > >     >     -----Ursprüngliche Nachricht----- >     >     *Von:* Henning Meyer via Std-Proposals >     >     <std-proposals_at_[hidden]> >     >      - For ranges, the language has a way of marking them, that >     is via >     >     begin()/end() methods or free functions. >     > >     > >     -- >     Std-Proposals mailing list >     Std-Proposals_at_[hidden] >     https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals > > -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-03-15 15:40:57