C++ Logo

sg15

Advanced search

Re: A Different Approach To Compiling C++

From: Hassan Sajjad <hassan.sajjad069_at_[hidden]>
Date: Sun, 24 Sep 2023 09:06:57 +0500
Thank you for highlighting these points. These limitations can be
addressed, and they should have no impact under ideal conditions, such as
having sufficient RAM and a stable compiler.


> 1. Would you require the compiler shared library to be able to perform
> a compilation without use of global state (so that multiple compilations
> can be performed concurrently)? All C++ compilers that I'm aware of use
> global state today and cannot handle multiple compilations within the same
> process.
>
>
The build system may use one-process-per-compilation with inter-process
communication. However, the compiler shared library must be able to do
multiple compilations in one process concurrently. I think these
modifications could be made in the compiler. The global state needs to be
converted to a local state that is created on the newCompile call and is
cleared with the last resumeCompile call.


> 1. Modern build systems manage multiple concurrently running jobs. How
> would concurrency be handled in your model? Would all compilations occur
> within the (single?) build system process? If so, how would you expect the
> increased memory and CPU utilization to effect administrative resource
> limitations imposed by POSIX ulimits or Windows job restrictions? What
> would the effect be on the Linux OOM process killer? How would the build
> system handle a compiler crash that occurs during the build? How would an
> administrator be expected to diagnose a hung or exceptionally long running
> compilation job when they can't rely on process listing to spot individual
> compiler processes.
>
>
In this model, the build system cannot handle compiler crashes or excessive
RAM usage of the build system. Both of these can lead to the build process
termination. The build system can detect exceptionally long compilation-job
e.g. it can register the thread-id and the timestamp before the newCompile
or resumeCompile call and inform the user if the thread-id is not cleared
soon enough.

 In some cases, modifications to the configuration controlling the resource
limitations for the build process might be needed as well.

The build system should support both models, so the user has the option to
fall back.


> 1. Coverity and similar tools that rely on functionality like that
> provided by Bear <https://github.com/rizsotto/Bear> would not work
> with your model without some protocol that enables them to interact with
> the build system as jobs start and end.
>
>
The build system can generate a file with compile commands based on
discovered module dependencies and header units.

Updated paper:
https://htmlpreview.github.io/?https://github.com/HassanSajjad-302/iso-papers/blob/main/generated/my-paper.html#what-are-the-tradeoffs
Diff:
https://github.com/HassanSajjad-302/iso-papers/commit/6efe44fc2e4b74046856f3edb0e84bea4d00123c?short_path=97ede4f#diff-97ede4f82ab7b62ba59e0eb311dfb817cbc8af90a8e13cd6e26348874d156ac3

Best,
Hassan Sajjad


On Fri, Sep 22, 2023 at 7:53 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 9/22/23 3:49 AM, Hassan Sajjad via SG15 wrote:
>
> Hi.
>
> I think you have plenty of rigor here to justify a numbered ISO paper to
>> drive an agenda item for the Tooling Study Group (SG-15).
>
>
> Thank you. I would like to present my paper
> https://htmlpreview.github.io/?https://github.com/HassanSajjad-302/iso-papers/blob/main/generated/my-paper.html
> .
>
> Some additional things that I think would be helpful to address in the
> paper:
>
> 1. Modern build systems manage multiple concurrently running jobs. How
> would concurrency be handled in your model? Would all compilations occur
> within the (single?) build system process? If so, how would you expect the
> increased memory and CPU utilization to effect administrative resource
> limitations imposed by POSIX ulimits or Windows job restrictions? What
> would the effect be on the Linux OOM process killer? How would the build
> system handle a compiler crash that occurs during the build? How would an
> administrator be expected to diagnose a hung or exceptionally long running
> compilation job when they can't rely on process listing to spot individual
> compiler processes.
> 2. Would you require the compiler shared library to be able to perform
> a compilation without use of global state (so that multiple compilations
> can be performed concurrently)? All C++ compilers that I'm aware of use
> global state today and cannot handle multiple compilations within the same
> process.
> 3. Coverity and similar tools that rely on functionality like that
> provided by Bear <https://github.com/rizsotto/Bear> would not work
> with your model without some protocol that enables them to interact with
> the build system as jobs start and end.
>
> Tom.
>
>
> I have emailed Nevin Liber for the number following the procedures
> guidelines from the following link,
>
> https://isocpp.org/std/standing-documents/sd-7-mailing-procedures-and-how-to-write-papers
>
> Though I expect finding some dedicated attention from the author of the
>> papers you cite would be a productive use of time beforehand, so at least
>> you two are on the same line of thinking before starting a larger group
>> discussion. But it's worth submitting a paper to drive an interactive
>> discussion even if you cannot reasonably arrange that sidebar discussion.
>
>
> Sorry, I could not think of any papers to cite.
>
> I would be very grateful if anyone is interested in co-authoring the next
> revisions of the paper.
>
> Best,
> Hassan Sajjad
>
> On Tue, Sep 12, 2023 at 7:00 PM Bret Brown <mail_at_[hidden]> wrote:
>
>> I think you have plenty of rigor here to justify a numbered ISO paper to
>> drive an agenda item for the Tooling Study Group (SG-15). Though I expect
>> finding some dedicated attention from the author of the papers you cite
>> would be a productive use of time beforehand, so at least you two are on
>> the same line of thinking before starting a larger group discussion. But
>> it's worth submitting a paper to drive an interactive discussion even if
>> you cannot reasonably arrange that sidebar discussion.
>>
>> As to the compiler categorizing flags as local or not, I'm open to the
>> idea I suppose. Though I suspect there are a lot of strange edge cases that
>> would justify a feature in which authored information is respected about
>> which preprocessor definitions are explicitly local and which ones
>> explicitly are not.
>>
>> But I agree with you that in most cases, nobody should want to maintain
>> detailed build system configurations about how to categorize each flag.
>> Though I expect that normally what will happen is people won't consider
>> this at all and use whatever defaults the build system supports, which I
>> expect to be non-local flags since that's the more conservative option with
>> respect to accurate parsing.
>>
>> Bret
>>
>> On Mon, Sep 11, 2023, 10:02 Hassan Sajjad <hassan.sajjad069_at_[hidden]>
>> wrote:
>>
>>> Hi.
>>>
>>> I emailed this https://lists.isocpp.org/sg15/2023/09/2040.php a few
>>> days ago.
>>> I am looking forward for your review.
>>>
>>> Best,
>>> Hassan Sajjad
>>>
>>> On Tue, Sep 5, 2023, 09:46 Hassan Sajjad <hassan.sajjad069_at_[hidden]>
>>> wrote:
>>>
>>>> Hi.
>>>>
>>>> Please share your thoughts on this.
>>>>
>>>> Best,
>>>> Hassan Sajjad
>>>>
>>>>
>>>> On Sun, Sep 3, 2023, 04:57 Hassan Sajjad <hassan.sajjad069_at_[hidden]>
>>>> wrote:
>>>>
>>>>> Hi.
>>>>>
>>>>> First of all, great job on your writeups. They are very well
>>>>>> considered. I appreciate how much effort you're putting into communicating
>>>>>> your ideas and developing better C++ tools.
>>>>>>
>>>>>
>>>>> Thank you. This complement means a lot. It would be an immense
>>>>> pleasure if I or my project could be helpful in improving C++ tooling.
>>>>>
>>>>> I think your design could work
>>>>>>
>>>>>
>>>>> This was so good to hear :). Thank you again.
>>>>>
>>>>>
>>>>>> though I expect the node identities of the imported entities need to
>>>>>> be tweaked to better model a combination of the file and the flags used to
>>>>>> parse the file. The build system would ideally know when to just reuse a
>>>>>> particular parse of a particular entity, especially when optimizing for
>>>>>> build speed as you are.
>>>>>>
>>>>>
>>>>> With the new modifications my build system now covers 50% of the use
>>>>> case you have described in your email. Please see the link
>>>>> https://github.com/HassanSajjad-302/Example12/blob/main/hmake.cpp.
>>>>>
>>>>> This might appear big, but without comments, I think this is quite
>>>>> concise for what it is achieving. This covers the scenarios where in a
>>>>> mono-repo style project a dependency builds with different flags than the
>>>>> consumer e.g. when the dependency supports only the c++20 but the project
>>>>> is using c++23 or e.g. a dependency has to be built by a different compiler
>>>>> for performance reasons.
>>>>>
>>>>> However, this is a manual approach, and we can not detect whether the
>>>>> ifc files from a prebuilt target are compatible for use in a target we are
>>>>> building or not.
>>>>>
>>>>> This paper
>>>>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2581r2.pdf
>>>>> provides good guidance on how this can be achieved in the build system.
>>>>>
>>>>> It is mentioned in the paper
>>>>>
>>>>> Those categories, however, represent higher-level semantics. It is not
>>>>>> the case that the build system can introspect the command line used to
>>>>>> produce a BMI on another project and decide which arguments fall on which
>>>>>> of the categories. They need to be authored by the engineer maintaining the
>>>>>> build system.
>>>>>
>>>>>
>>>>> This paper proposes that build systems should offer a mechanism to
>>>>>> identify which of the options used in a translation unit are a Local
>>>>>> Preprocessor Argument.
>>>>>
>>>>>
>>>>> It is also mentioned
>>>>>
>>>>> The scope of compatibility for consuming an existing built module
>>>>>> interface file is defined by the union of the Basic Toolchain Configuration
>>>>>> Arguments and the BMI-Sensitive arguments. And it should explicitly exclude
>>>>>> Local Preprocessor Arguments. Those arguments are the ones that should be
>>>>>> used to define the compatibility identifier.
>>>>>
>>>>>
>>>>> The build system should take the compiler invocation for a translation
>>>>>> unit, remove the Local Preprocessor Arguments and the reference to the
>>>>>> specific translation unit and invoke the compiler in that mode in order to
>>>>>> obtain the compatibility identifier for BMIs produced and consumed by that
>>>>>> translation unit.
>>>>>
>>>>>
>>>>> I was thinking that instead of changing the build system to have an
>>>>> API to allow the user to specify the flags of different categories this job
>>>>> is also given to the compiler. So, the compiler when invoked with a
>>>>> specific option with the full compile command excluding input files,
>>>>> outputs:
>>>>> 1) Concatenation of Basic Toolchain Configuration Arguments and
>>>>> BMI-Sensitive Arguments.
>>>>> 2) Compiler Definitions
>>>>> 3) Include Directories
>>>>> 4) Other Arguments
>>>>> 5) Compatibility Identifier
>>>>>
>>>>> Following the example mentioned in the paper, I broke Local
>>>>> Preprocessor Arguments into 2 and 3. Can there be more besides these? Also,
>>>>> maybe 5 can be left out and 1 can be used instead.
>>>>>
>>>>> Exempting the build-system of the responsibility of categorization of
>>>>> different flags keeps the user-facing API of the build-system more concise
>>>>> as the user does not need to specify the compiler flags in 4 different
>>>>> categories instead of 1.
>>>>>
>>>>> It seems that there is acceptance regarding the dynamic loading of the
>>>>> compiler shared-library and the mentioned API. Would it be appropriate to
>>>>> prepare a formal proposal for consideration?
>>>>>
>>>>> Best,
>>>>> Hassan Sajjad
>>>>>
>>>>> On Sat, Aug 26, 2023 at 3:48 AM Bret Brown <mail_at_[hidden]>
>>>>> wrote:
>>>>>
>>>>>> Hi Hassan,
>>>>>>
>>>>>> First of all, great job on your writeups. They are very well
>>>>>> considered. I appreciate how much effort you're putting into communicating
>>>>>> your ideas and developing better C++ tools.
>>>>>>
>>>>>> I have one concern with the last design you shared with us. As far As
>>>>>> I understand currently, we expect that both imported header units and
>>>>>> public module interfaces will need to be parsed multiple times per build
>>>>>> graph. This has been discussed in the ISO Tooling Study Group a bit
>>>>>> already, so it's not a new isea. In short, it's expected that in typical
>>>>>> scenarios, the build system would need to model multiple parses of a
>>>>>> particular importable entity. This is because, unfortunately, some
>>>>>> compilation options of the importing translation unit will need to be used
>>>>>> to parse the imported translation unit.
>>>>>>
>>>>>> I think your design could work, though I expect the node identities
>>>>>> of the imported entities need to be tweaked to better model a combination
>>>>>> of the file and the flags used to parse the file. The build system would
>>>>>> ideally know when to just reuse a particular parse of a particular entity,
>>>>>> especially when optimizing for build speed as you are.
>>>>>>
>>>>>> Daniel is hinting at this challenge upthread in his reference to
>>>>>> preprocessor state. You can read more about that in his ISO papers that
>>>>>> have been published in the last year, including
>>>>>> https://wg22.link/p2898. Daniel also elaborated on this with
>>>>>> diagrams in his C++Now talk this spring. You can find it here:
>>>>>> https://youtu.be/_LGR0U5Opdg?si=H92b0eyGQd8Vsr0v. Note that Daniel
>>>>>> said upthread that his concerns are satisfied by some new ideas that were
>>>>>> discussed in Varna. But the need for build systems to model multiple parses
>>>>>> per imported interface remains given current understanding.
>>>>>>
>>>>>> I suspect certain codebases that are carefully governed might have
>>>>>> exactly one parse per imported unit, but a build system that wants to
>>>>>> support things like existing package management ecosystems, among other
>>>>>> examples, would need to consider a more complicated approach.
>>>>>>
>>>>>> Bret Brown
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 24, 2023, 23:53 Hassan Sajjad via SG15 <
>>>>>> sg15_at_[hidden]> wrote:
>>>>>>
>>>>>>> Thanks for reaching out. I have gone through the thread and I am
>>>>>>>> still a little confused probably because the approach is quite
>>>>>>>> different
>>>>>>>> from what we have been used to.
>>>>>>>>
>>>>>>>
>>>>>>> Thank you so much for commenting. Yes, it is a quite different
>>>>>>> approach.
>>>>>>>
>>>>>>> It seems that there are two aspects that you are proposing: a) the
>>>>>>>> way the build system describes the build rules; b) how we can
>>>>>>>> efficiently translate the build rules into action. IIUC in some
>>>>>>>> languages (duck typing predominantly) they use the language itself
>>>>>>>> to
>>>>>>>> describe the rules, too.
>>>>>>>>
>>>>>>>
>>>>>>> I just wanted to let you know that I am not commenting on what
>>>>>>> language the build-system should use. I am just proposing a new approach
>>>>>>> with the potential of a good speed-up. I am only partially aware of the
>>>>>>> other build-systems design, specifically the design around C++20 modules /
>>>>>>> header-units. Because this is a different approach, non-trivial changes
>>>>>>> might be needed in other build-systems to support it.
>>>>>>>
>>>>>>> The second part is more interesting to me. AFAICT your approach
>>>>>>>> inverts the build graph /somehow/. Is the aim to provide a "symbol
>>>>>>>> registry" which upon of the symbol to compile the relevant
>>>>>>>> subgraph? I'd
>>>>>>>> appreciate if you could describe more verbosely (with examples
>>>>>>>> maybe)
>>>>>>>> how the process works.
>>>>>>>>
>>>>>>>
>>>>>>> My pleasures.
>>>>>>>
>>>>>>> Please see the attached document. In it, I analyze how I plan to add
>>>>>>> support for this in my build-system. I also provide an example. Based on
>>>>>>> the analysis, I am confident in my ability to implement this within one
>>>>>>> month if it's approved and a compiler with an API becomes available. While
>>>>>>> I currently see no issues, I acknowledge the possibility of limitations and
>>>>>>> potential errors. I am currently awaiting feedback from other stakeholders
>>>>>>> before proceeding.
>>>>>>>
>>>>>>> A slight correction in my above email. I mentioned that my
>>>>>>> build-system now is limitation-free with the adoption of the new consensus.
>>>>>>> Well, it is for a clean build. But there is a small bug for rebuild which
>>>>>>> will be fixed soon.
>>>>>>>
>>>>>>> Best,
>>>>>>> Hassan Sajjad
>>>>>>>
>>>>>>> On Tue, Aug 22, 2023 at 8:49 PM Vassil Vassilev <
>>>>>>> v.g.vassilev_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> Hi Hassan,
>>>>>>>>
>>>>>>>> Thanks for reaching out. I have gone through the thread and I am
>>>>>>>> still a little confused probably because the approach is quite
>>>>>>>> different
>>>>>>>> from what we have been used to.
>>>>>>>>
>>>>>>>> It seems that there are two aspects that you are proposing: a)
>>>>>>>> the
>>>>>>>> way the build system describes the build rules; b) how we can
>>>>>>>> efficiently translate the build rules into action. IIUC in some
>>>>>>>> languages (duck typing predominantly) they use the language itself
>>>>>>>> to
>>>>>>>> describe the rules, too.
>>>>>>>>
>>>>>>>> The second part is more interesting to me. AFAICT your approach
>>>>>>>> inverts the build graph /somehow/. Is the aim to provide a "symbol
>>>>>>>> registry" which upon of the symbol to compile the relevant
>>>>>>>> subgraph? I'd
>>>>>>>> appreciate if you could describe more verbosely (with examples
>>>>>>>> maybe)
>>>>>>>> how the process works.
>>>>>>>>
>>>>>>>> Best, Vassil
>>>>>>>>
>>>>>>>> On 7/29/23 3:28 PM, Hassan Sajjad via SG15 wrote:
>>>>>>>> > Hi.
>>>>>>>> >
>>>>>>>> > I will like to showcase my build-system HMake
>>>>>>>> > https://github.com/HassanSajjad-302/HMake.
>>>>>>>> >
>>>>>>>> > It has C++20 modules and header-units support. It also supports
>>>>>>>> > drop-in header-files to header-units replacement.
>>>>>>>> >
>>>>>>>> > With it, with MSVC I compiled an SFML example with C++ 20
>>>>>>>> header-units.
>>>>>>>> > https://github.com/HassanSajjad-302/SFML
>>>>>>>> >
>>>>>>>> > HMake however has a flaw and there is no easy solution for it. To
>>>>>>>> fix
>>>>>>>> > this, I will like to propose a new way to compile source files.
>>>>>>>> >
>>>>>>>> https://github.com/HassanSajjad-302/HMake/wiki/HMake-Flaw-And-Its-Possible-Fixes
>>>>>>>> >
>>>>>>>> > I am very confident that the adoption of this will result
>>>>>>>> in flawless
>>>>>>>> > module and header-unit support in HMake which will translate to a
>>>>>>>> very
>>>>>>>> > good user experience while converting their code base to C++20
>>>>>>>> modules.
>>>>>>>> >
>>>>>>>> > Please share your thoughts.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> > Hassan Sajjad
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > SG15 mailing list
>>>>>>>> > SG15_at_[hidden]
>>>>>>>> > https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> SG15 mailing list
>>>>>>> SG15_at_[hidden]
>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>>>>>>
>>>>>>
> _______________________________________________
> SG15 mailing listSG15_at_[hidden]://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
>

Received on 2023-09-24 04:07:12