Date: Thu, 29 Jul 2021 00:51:44 +0000
does it do two passes to decide if it's going to write straight through or not? I.e. if I have some valid utf-8 followed by some bogus UTF-8 will it transcode the valid bits then pass the invalid bits right through?
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: Tuesday, July 13, 2021 10:41 PM
To: sg16_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] std::print usage experience
On 7/12/21 5:33 PM, Victor Zverovich via SG16 wrote:
Thanks, Corentin, that's a great find. This is indeed very close to what is proposed and a nice addition to the implementation experience in {fmt} and Rust.
There are some notable differences relative to what we've been discussing.
1. The LLVM implementation has a wide contract; there is no UB if the input is not valid UTF-8.
2. If the provided text is not valid UTF-8, then the raw input is written through the file descriptor rather than being transcoded to UTF-16 with substitution characters and then written directly to the console.
Tangent: the method LLVM is using to determine if a file descriptor corresponds to a console<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm-mirror%2Fllvm%2Fblob%2Fmaster%2Flib%2FSupport%2Fraw_ostream.cpp%23L589-L593&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859365692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LFmRFUfIDdH99IPt3M6o8caVqnrFOTY5DjLK8sARk9Q%3D&reserved=0> looks wrong to me and, I think, will result in attempts to write text to the console when it was actually directed elsewhere. Such writes likely fail with the result that the original input ends up getting written to the file descriptor anyway. We previously discussed how to detect output directed to a console here<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fsg16%2F2021%2F01%2F2008.php&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859375632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QeO6vYYjWRz92A2%2FsPDgAAUEJqorlNeN7NA3bsWBM4Y%3D&reserved=0> (and some day we'll have email archives that aren't embarrassing. I hope).
Tom.
Cheers,
Victor
On Mon, Jul 12, 2021 at 1:59 PM Corentin via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
Hello,
LLVM implemented exactly what is proposed by std::print 3 years ago
https://github.com/llvm-mirror/llvm/blob/master/lib/Support/raw_ostream.cpp#L638-L685<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm-mirror%2Fllvm%2Fblob%2Fmaster%2Flib%2FSupport%2Fraw_ostream.cpp%23L638-L685&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859375632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=crqvJh3W0odkygOj5R7RMXMHubQT%2F6PgNoBCheyPRaM%3D&reserved=0>
I thought this might be of interest.
Regards,
Corentin
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: Tuesday, July 13, 2021 10:41 PM
To: sg16_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] std::print usage experience
On 7/12/21 5:33 PM, Victor Zverovich via SG16 wrote:
Thanks, Corentin, that's a great find. This is indeed very close to what is proposed and a nice addition to the implementation experience in {fmt} and Rust.
There are some notable differences relative to what we've been discussing.
1. The LLVM implementation has a wide contract; there is no UB if the input is not valid UTF-8.
2. If the provided text is not valid UTF-8, then the raw input is written through the file descriptor rather than being transcoded to UTF-16 with substitution characters and then written directly to the console.
Tangent: the method LLVM is using to determine if a file descriptor corresponds to a console<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm-mirror%2Fllvm%2Fblob%2Fmaster%2Flib%2FSupport%2Fraw_ostream.cpp%23L589-L593&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859365692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LFmRFUfIDdH99IPt3M6o8caVqnrFOTY5DjLK8sARk9Q%3D&reserved=0> looks wrong to me and, I think, will result in attempts to write text to the console when it was actually directed elsewhere. Such writes likely fail with the result that the original input ends up getting written to the file descriptor anyway. We previously discussed how to detect output directed to a console here<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fsg16%2F2021%2F01%2F2008.php&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859375632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QeO6vYYjWRz92A2%2FsPDgAAUEJqorlNeN7NA3bsWBM4Y%3D&reserved=0> (and some day we'll have email archives that aren't embarrassing. I hope).
Tom.
Cheers,
Victor
On Mon, Jul 12, 2021 at 1:59 PM Corentin via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
Hello,
LLVM implemented exactly what is proposed by std::print 3 years ago
https://github.com/llvm-mirror/llvm/blob/master/lib/Support/raw_ostream.cpp#L638-L685<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm-mirror%2Fllvm%2Fblob%2Fmaster%2Flib%2FSupport%2Fraw_ostream.cpp%23L638-L685&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859375632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=crqvJh3W0odkygOj5R7RMXMHubQT%2F6PgNoBCheyPRaM%3D&reserved=0>
I thought this might be of interest.
Regards,
Corentin
-- SG16 mailing list SG16_at_[hidden]<mailto:SG16_at_[hidden]> https://lists.isocpp.org/mailman/listinfo.cgi/sg16<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg16&data=04%7C01%7CCharles.Barto%40microsoft.com%7C8abe3f543e014c1fec7608d9468a0362%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637618380859375632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yUVS4wzOYpXPM2YHcQ8BP%2BQIeteMG3c3UY5coXE0Uwg%3D&reserved=0>
Received on 2021-07-28 19:51:50