Date: Tue, 2 Jul 2024 18:08:05 +0000
Yeah,
If you want to pass Qt Strings trough std::formatter, then per parameter allocation is going to be necessary, that will not improve.
But if std::string is what gets used, then why not skip the QByteArray altogether?
It feels like at least that copy could be saved by implementing the transcoding directly, but I would have to look at the code to be sure.
But yes, I've heard rumors about charN_t support for formatters, not sure it is going anywhere any time soon because the feeling I got is that there's no real idea on the direction on how that should look like.
UTF-8 seems to be on more solid grounds than the rest, if I had to bet, I would say that will come out before anything else, but even that looks unsure at this point.
I don't think you would want to wait until 2030 to implement the feature.
If you are implementing a custom qFormat as an alternative solution to std::format, I would advise not basing it on std::format, you can do everything with 0 copies and without touching the heap.
-----Original Message-----
From: Thiago Macieira <thiago_at_macieira.org>
Sent: Tuesday, July 2, 2024 17:02
To: sg16_at_lists.isocpp.org; Tiago Freire <tmiguelf_at_hotmail.com>
Subject: Re: [isocpp-sg16] std::format and charN_t
On Tuesday 2 July 2024 13:39:38 CEST Tiago Freire wrote:
> Ok, that seems more inline with how I thought it was working.
> But then again if you are over-allocating and shrinking on a
> per-parameter basis it's not really pre-allocation. Not sure if that
> is what the OP had in mind, if he was worried about the cost of
> transcoding after formatting (and wanting pre-allocation) the cost of
> that is going to be relatively low compared to everything else.
Ivan and I work together in Qt (though not for the same company). I'm actually the one who asked him to post our concerns to this ML.
We are worried about the cost of transcoding and the cost of memcpying data.
In this particular ask, the question was about allocating an additional buffer and memcpying data out of it and onto the destination string. Right now, there's no way to avoid this extra cost while doing transcoding, so we won't try. Therefore, when formating a QLatin1StringView or QString onto a std::string, we will have to:
1) allocate a QByteArray of the maximum size (which is 2x the size of the
latin1 string or 3x the codepoint count of the UTF-16 one)
2) transcode onto it
3) shrink in to size
4) allocate a std::string of the correct size
5) copy onto it
6) deallocate the QByteArray
7) use std::formatter<std::string>, which memcpy's it to the destination std::string
(Steps 1 to 3 happen inside existing Qt functions)
The question about the cost of transcoding was in relation to a possible work around / solution to the above. The Standard may provide a std::u16string formatter onto a std::string, which would eliminate all of the above and replace with a vendor's implementation. However, how good is the implementation of the converter? Of the three major Standard Library implementations, only one has vested interest in UTF-16. And because Qt has been using UTF-16 since 2001, it hs very highly optimised converters we would like to reuse.
Finally,, when formatting a QLatin1StringView or a QUtf8StringView onto a QString, I will insist that qFormat not do the double allocation and double memcpy.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Principal Engineer - Intel DCAI Platform & System Engineering
If you want to pass Qt Strings trough std::formatter, then per parameter allocation is going to be necessary, that will not improve.
But if std::string is what gets used, then why not skip the QByteArray altogether?
It feels like at least that copy could be saved by implementing the transcoding directly, but I would have to look at the code to be sure.
But yes, I've heard rumors about charN_t support for formatters, not sure it is going anywhere any time soon because the feeling I got is that there's no real idea on the direction on how that should look like.
UTF-8 seems to be on more solid grounds than the rest, if I had to bet, I would say that will come out before anything else, but even that looks unsure at this point.
I don't think you would want to wait until 2030 to implement the feature.
If you are implementing a custom qFormat as an alternative solution to std::format, I would advise not basing it on std::format, you can do everything with 0 copies and without touching the heap.
-----Original Message-----
From: Thiago Macieira <thiago_at_macieira.org>
Sent: Tuesday, July 2, 2024 17:02
To: sg16_at_lists.isocpp.org; Tiago Freire <tmiguelf_at_hotmail.com>
Subject: Re: [isocpp-sg16] std::format and charN_t
On Tuesday 2 July 2024 13:39:38 CEST Tiago Freire wrote:
> Ok, that seems more inline with how I thought it was working.
> But then again if you are over-allocating and shrinking on a
> per-parameter basis it's not really pre-allocation. Not sure if that
> is what the OP had in mind, if he was worried about the cost of
> transcoding after formatting (and wanting pre-allocation) the cost of
> that is going to be relatively low compared to everything else.
Ivan and I work together in Qt (though not for the same company). I'm actually the one who asked him to post our concerns to this ML.
We are worried about the cost of transcoding and the cost of memcpying data.
In this particular ask, the question was about allocating an additional buffer and memcpying data out of it and onto the destination string. Right now, there's no way to avoid this extra cost while doing transcoding, so we won't try. Therefore, when formating a QLatin1StringView or QString onto a std::string, we will have to:
1) allocate a QByteArray of the maximum size (which is 2x the size of the
latin1 string or 3x the codepoint count of the UTF-16 one)
2) transcode onto it
3) shrink in to size
4) allocate a std::string of the correct size
5) copy onto it
6) deallocate the QByteArray
7) use std::formatter<std::string>, which memcpy's it to the destination std::string
(Steps 1 to 3 happen inside existing Qt functions)
The question about the cost of transcoding was in relation to a possible work around / solution to the above. The Standard may provide a std::u16string formatter onto a std::string, which would eliminate all of the above and replace with a vendor's implementation. However, how good is the implementation of the converter? Of the three major Standard Library implementations, only one has vested interest in UTF-16. And because Qt has been using UTF-16 since 2001, it hs very highly optimised converters we would like to reuse.
Finally,, when formatting a QLatin1StringView or a QUtf8StringView onto a QString, I will insist that qFormat not do the double allocation and double memcpy.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Principal Engineer - Intel DCAI Platform & System Engineering
Received on 2024-07-02 18:08:10