C++ Logo

sg16

Advanced search

Re: [SG16] Reinterpreting pointers of character types

From: Peter Brett <pbrett_at_[hidden]>
Date: Thu, 28 Jan 2021 18:22:30 +0000
I think the big problem here is trying to make it a template.

Make it named. It’s literally not possible to use this correctly in generic code.

                      Peter

From: Corentin <corentin.jabot_at_[hidden]>
Sent: 28 January 2021 18:21
To: Peter Brett <pbrett_at_cadence.com>
Cc: sg16_at_[hidden]; Aaron Ballman <aaron_at_[hidden]>
Subject: Re: [SG16] Reinterpreting pointers of character types

EXTERNAL MAIL


On Thu, Jan 28, 2021 at 7:08 PM Peter Brett <pbrett_at_[hidden]<mailto:pbrett_at_[hidden]>> wrote:
Hi Corentin,

At the moment, this just looks like a weird reinterpret_cast in the example you provide. I think it would be good to use a name that makes it clear that:


  1. The function has a precondition of a specific encoding of text behind the pointer/span
  2. The function is hazardous to call

In my view, Rust’s API is ideal:


  * There is a from_utf8_unchecked() function that is explicitly unsafe and clearly states what it is doing in its name: https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html<https://urldefense.com/v3/__https:/doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html__;!!EHscmS1ygiU1lA!Ws2sxfg1O-WP_cC3rmBUWTcyFgYdcryhje6m4DRgYZtXa1sVVDkwujV5h-_T5w$>
  * There is also a from_utf8() function that is safe, but returns a Result<&str, Utf8Error> to indicate where UTF-8 validation failed

This is good design because the functions have names that clearly indicate the encoding precondition and whether or not checking is performed. The safer version of the function is also has a shorter name, so it shows up first in autocompletion and documentation. People are guided to use the safe/checked option first.

I agree that the name is bad. I think we want "cast" in the name though - or something of that nature.

I strongly oppose adding the unchecked cast function to the standard library without simultaneously adding the corresponding checked cast.

This is supposed to be a super super low level function.
I would like to see checking functions in the standard for ex bool is_valid_utf8(range);

However, checking for validity is an expensive operation.
Is there a scenario in which we want to check utfN validity but can afford a copy?

Either way, I'd like to see these things layered on top of one another.

Cut yeah... suggestions for a better name? cast_string_unchecked?


                           Peter

From: SG16 <sg16-bounces_at_[hidden]<mailto:sg16-bounces_at_[hidden]>> On Behalf Of Corentin via SG16
Sent: 28 January 2021 17:52
To: SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>>; Aaron Ballman <aaron_at_[hidden]<mailto:aaron_at_[hidden]>>
Cc: Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>
Subject: [SG16] Reinterpreting pointers of character types

EXTERNAL MAIL
Hello,

In which I propose
reinterpret_string_cast, a function to cast between char* and char8_t* without UB.

This ô terrible paper is intended as a discussion starter so that we can refine the design with the help of our good friends in CWG :)

Have a great day,

Corentin

Received on 2021-01-28 12:22:52