C++ Logo

SG16

Advanced search

Subject: Re: Reinterpreting pointers of character types
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-01-28 12:20:46


On Thu, Jan 28, 2021 at 7:08 PM Peter Brett <pbrett_at_[hidden]> wrote:

> Hi Corentin,
>
>
>
> At the moment, this just looks like a weird reinterpret_cast in the
> example you provide. I think it would be good to use a name that makes it
> clear that:
>
>
>
> 1. The function has a precondition of *a specific encoding* of text
> behind the pointer/span
> 2. The function is *hazardous to call*
>
>
>
> In my view, Rust’s API is ideal:
>
>
>
> - There is a *from_utf8_unchecked()* function that is explicitly
> unsafe and clearly states what it is doing in its name:
> https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html
> - There is also a *from_utf8()* function that is safe, but returns a *Result<&str,
> Utf8Error>* to indicate where UTF-8 validation failed
>
>
>
> This is good design because the functions have names that clearly indicate
> the encoding precondition and whether or not checking is performed. The
> safer version of the function is also has a shorter name, so it shows up
> first in autocompletion and documentation. People are guided to use the
> safe/checked option first.
>

I agree that the name is bad. I think we want "cast" in the name though -
or something of that nature.

>
>
> I strongly oppose adding the unchecked cast function to the standard
> library without simultaneously adding the corresponding checked cast.
>

This is supposed to be a super super low level function.
I would like to see checking functions in the standard for ex bool
is_valid_utf8(range);

However, checking for validity is an expensive operation.
Is there a scenario in which we want to check utfN validity but can
afford a copy?

Either way, I'd like to see these things layered on top of one another.

Cut yeah... suggestions for a better name? cast_string_unchecked?

>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin via
> SG16
> *Sent:* 28 January 2021 17:52
> *To:* SG16 <sg16_at_[hidden]>; Aaron Ballman <aaron_at_[hidden]>
> *Cc:* Corentin <corentin.jabot_at_[hidden]>
> *Subject:* [SG16] Reinterpreting pointers of character types
>
>
>
> EXTERNAL MAIL
>
> Hello,
>
>
>
> In which I propose
>
> reinterpret_string_cast, a function to cast between char* and char8_t*
> without UB.
>
>
>
> This ô terrible paper is intended as a discussion starter so that we can
> refine the design with the help of our good friends in CWG :)
>
>
>
> Have a great day,
>
>
>
> Corentin
>



SG16 list run by sg16-owner@lists.isocpp.org