Date: Mon, 18 Jan 2021 18:59:09 +0200
Hi,
One have look spesifically at the assemble and instruction code generated.
But for algorithms that are going to iterate string data that terminated
with a sentinal, and go look at ever value, one can reduce a assemble level
if statement bounds or range check, so then those examples only single if
statement, which much faster for comparisons. If I were writing assemble.
For contiguous memory we're there is no sentinal to terminate string or
characters, then parralizing the checking of the bounderies with comparison
statement or other statements, would allow reduction in clock cycles, or
instructions required. Boundary how can eliminate bounds, range checking,
to reduce unessary if statements and write less assemble.
Only way I see that happen in the future is if a processor can execute two
instructions simultaneous, involving different registers.
Imaging writing the assemble. And how reduce assemble code and
instructions, as that's what ends up as, if can't if can't improve it at
that level, with the cou instruction set, they won't be able more
intelligently squize out more performance.
Guess other way, have compiler, identify the pattern rewrite the code, to
spesific machine instruction set different if possible. That cpu could do
at small scale for things does already. Compiler would be better. If don't
improve or try few new things and look at if hardware like arm thinks it's
possible good think to look at, as compiler could be updated, for new
language support, to get more performance from there chips.
Obviously there is alot of compiler optermizations, but unless llvm we go
to a list or someone can give more dire tion we have no idea how much
further we can push things and creativity of our solutions. I think llvm
and arm so would need to give there perspective take on much more juice
they feel it allow them to squize out of the processors.
I know this not the most simple of things this one, there a few layers to
this.
If you have contracts in these domains would be glad to chat with them to
get more detail at other levels.
Kind Regard,
Wesley Oliver
On Mon, 18 Jan 2021, 18:21 Thiago Macieira via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> On Monday, 18 January 2021 00:12:18 PST Wesley Oliver via Std-Proposals
> wrote:
> > Hi,
> >
> > I would like to look at how to achieve the same performance that c++
> would
> > be capable of achieving with characters that are '\0' terminate
> physically
> > by there last data position. Because data that is null terminate, doesn't
> > require range checking.
>
> That's an incorrect statement. Just because the bounds are implicit does
> not
> mean range checking is optional. Anywhere where range checking should have
> been done, it still needs to be done. The only difference is that you must
> calculate the range before doing the checking.
>
> As a consequence of having to iterate to find the boundary, some
> operations
> become different.
>
> But it's not true that implicit termination or termination by sentinel
> makes
> things faster. In fact, from experience, it's quite the opposite. So
> you'll
> need to prove your hypothesis with data if it is the motivation factor for
> this.
>
> > So my question is how could we improve things, such that the typically
> > conditional bounds checking statements for int array or similar could be
> > reduce or written in slightly different form,
> > such that we can achieve the same performance as null terminated data.
>
> Please consider that "achieve the same performance as null-terminated
> data"
> currently means "reducing performance". It's not what you want.
>
> > For this to happen, it would require improved compiler and also
> > hypothetically invisigaing new cpu wiring or logic, that in 2025 years
> time
> > would give us massive performance boost, as we have figure out how to
> write
> > could that has many performance knocks in better way, by reduction or
> what
> > every that technic is, to reduce the number of instructions required.
>
> Investigating a new CPU is out of scope. Moreover, I don't think you've
> investigated CPUs sufficiently, since they do run superscalar and
> pipelined,
> meaning they will run a few instructions ahead. For example, you said:
>
> > So the ideas I have from above, would be conditional statements that
> could
> > be parralized, with out changing the logic of the program. so think that
> > both
> > numArray[i] == numArray2[j]
> > i < maxlen
>
> They ALREADY are parallelised today. Like I said, I don't think you've
> done
> enough investigation of current CPUs before making this proposal. If you
> want
> to talk about how to improve CPUs, aside from locating the right people to
> talk to, you need to come with hard data showing where they can do better
> and
> where time is spent in overhead. Similalry, if you want to get compilers
> to
> improve, show some hard numbers on how they are not using the existing CPU
> capabilities to the full extent.
>
> From experience talking to CPU architects (and I have!), they have enough
> areas to currently address that can improve code by more than 1%. So your
> barrier to get their attention is to show that they could get at least
> that
> much.
>
> > function match(char* str) {
> >
> > char* matchme = "matchme\0";
> >
> > let countMatch = 0;
> >
> > for(char* ch = str, char chd = ch*;chd != '\0';ch++, chd = *ch)
> > {
> > // could make this an inline function.
> >
> > char* chs = str, char chds = chs*;
> > char* mech = matchme, char mechd = *mech ;
> > while(true) { // sure that by now compiler optermized, could just say
> > loop
> > if (chds != mechd) {
> > if(mechd == '\0') {
> > break;
> > continue; // kicks out of the loop and skip the rest of the
> > parent look code, for a case that fail, in the case of successfully
> match,
> > then countMatch++ will execute.
> > }
> > }
> > mech++; mechd = * mech; chs++; chds = *chs ;
> > }
> > countMatch++;
> > }
> > }
>
> One other thing: if you're going to post code for which performance can be
> imiproved, you have to start with the state of the art. Your code above
> for
> substring matching is not the most optimal today.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel DPG Cloud Engineering
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
One have look spesifically at the assemble and instruction code generated.
But for algorithms that are going to iterate string data that terminated
with a sentinal, and go look at ever value, one can reduce a assemble level
if statement bounds or range check, so then those examples only single if
statement, which much faster for comparisons. If I were writing assemble.
For contiguous memory we're there is no sentinal to terminate string or
characters, then parralizing the checking of the bounderies with comparison
statement or other statements, would allow reduction in clock cycles, or
instructions required. Boundary how can eliminate bounds, range checking,
to reduce unessary if statements and write less assemble.
Only way I see that happen in the future is if a processor can execute two
instructions simultaneous, involving different registers.
Imaging writing the assemble. And how reduce assemble code and
instructions, as that's what ends up as, if can't if can't improve it at
that level, with the cou instruction set, they won't be able more
intelligently squize out more performance.
Guess other way, have compiler, identify the pattern rewrite the code, to
spesific machine instruction set different if possible. That cpu could do
at small scale for things does already. Compiler would be better. If don't
improve or try few new things and look at if hardware like arm thinks it's
possible good think to look at, as compiler could be updated, for new
language support, to get more performance from there chips.
Obviously there is alot of compiler optermizations, but unless llvm we go
to a list or someone can give more dire tion we have no idea how much
further we can push things and creativity of our solutions. I think llvm
and arm so would need to give there perspective take on much more juice
they feel it allow them to squize out of the processors.
I know this not the most simple of things this one, there a few layers to
this.
If you have contracts in these domains would be glad to chat with them to
get more detail at other levels.
Kind Regard,
Wesley Oliver
On Mon, 18 Jan 2021, 18:21 Thiago Macieira via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> On Monday, 18 January 2021 00:12:18 PST Wesley Oliver via Std-Proposals
> wrote:
> > Hi,
> >
> > I would like to look at how to achieve the same performance that c++
> would
> > be capable of achieving with characters that are '\0' terminate
> physically
> > by there last data position. Because data that is null terminate, doesn't
> > require range checking.
>
> That's an incorrect statement. Just because the bounds are implicit does
> not
> mean range checking is optional. Anywhere where range checking should have
> been done, it still needs to be done. The only difference is that you must
> calculate the range before doing the checking.
>
> As a consequence of having to iterate to find the boundary, some
> operations
> become different.
>
> But it's not true that implicit termination or termination by sentinel
> makes
> things faster. In fact, from experience, it's quite the opposite. So
> you'll
> need to prove your hypothesis with data if it is the motivation factor for
> this.
>
> > So my question is how could we improve things, such that the typically
> > conditional bounds checking statements for int array or similar could be
> > reduce or written in slightly different form,
> > such that we can achieve the same performance as null terminated data.
>
> Please consider that "achieve the same performance as null-terminated
> data"
> currently means "reducing performance". It's not what you want.
>
> > For this to happen, it would require improved compiler and also
> > hypothetically invisigaing new cpu wiring or logic, that in 2025 years
> time
> > would give us massive performance boost, as we have figure out how to
> write
> > could that has many performance knocks in better way, by reduction or
> what
> > every that technic is, to reduce the number of instructions required.
>
> Investigating a new CPU is out of scope. Moreover, I don't think you've
> investigated CPUs sufficiently, since they do run superscalar and
> pipelined,
> meaning they will run a few instructions ahead. For example, you said:
>
> > So the ideas I have from above, would be conditional statements that
> could
> > be parralized, with out changing the logic of the program. so think that
> > both
> > numArray[i] == numArray2[j]
> > i < maxlen
>
> They ALREADY are parallelised today. Like I said, I don't think you've
> done
> enough investigation of current CPUs before making this proposal. If you
> want
> to talk about how to improve CPUs, aside from locating the right people to
> talk to, you need to come with hard data showing where they can do better
> and
> where time is spent in overhead. Similalry, if you want to get compilers
> to
> improve, show some hard numbers on how they are not using the existing CPU
> capabilities to the full extent.
>
> From experience talking to CPU architects (and I have!), they have enough
> areas to currently address that can improve code by more than 1%. So your
> barrier to get their attention is to show that they could get at least
> that
> much.
>
> > function match(char* str) {
> >
> > char* matchme = "matchme\0";
> >
> > let countMatch = 0;
> >
> > for(char* ch = str, char chd = ch*;chd != '\0';ch++, chd = *ch)
> > {
> > // could make this an inline function.
> >
> > char* chs = str, char chds = chs*;
> > char* mech = matchme, char mechd = *mech ;
> > while(true) { // sure that by now compiler optermized, could just say
> > loop
> > if (chds != mechd) {
> > if(mechd == '\0') {
> > break;
> > continue; // kicks out of the loop and skip the rest of the
> > parent look code, for a case that fail, in the case of successfully
> match,
> > then countMatch++ will execute.
> > }
> > }
> > mech++; mechd = * mech; chs++; chds = *chs ;
> > }
> > countMatch++;
> > }
> > }
>
> One other thing: if you're going to post code for which performance can be
> imiproved, you have to start with the state of the art. Your code above
> for
> substring matching is not the most optimal today.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel DPG Cloud Engineering
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2021-01-18 10:59:23