Date: Thu, 28 Jan 2021 19:37:07 +0100
On Thu, Jan 28, 2021 at 7:22 PM Peter Brett <pbrett_at_[hidden]> wrote:
> I think the big problem here is trying to make it a template.
>
>
>
> Make it named. It’s literally not possible to use this correctly in
> generic code.
>
Question then is do we want to solve the issue for wchar_t?
Because having the name of the encoding in the function kinda precludes
that - the sizeof(wchar_t) being platform dependant
>
>
> Peter
>
>
>
> *From:* Corentin <corentin.jabot_at_[hidden]>
> *Sent:* 28 January 2021 18:21
> *To:* Peter Brett <pbrett_at_[hidden]>
> *Cc:* sg16_at_[hidden]; Aaron Ballman <aaron_at_[hidden]>
> *Subject:* Re: [SG16] Reinterpreting pointers of character types
>
>
>
> EXTERNAL MAIL
>
>
>
>
>
> On Thu, Jan 28, 2021 at 7:08 PM Peter Brett <pbrett_at_[hidden]> wrote:
>
> Hi Corentin,
>
>
>
> At the moment, this just looks like a weird reinterpret_cast in the
> example you provide. I think it would be good to use a name that makes it
> clear that:
>
>
>
> 1. The function has a precondition of *a specific encoding* of text
> behind the pointer/span
> 2. The function is *hazardous to call*
>
>
>
> In my view, Rust’s API is ideal:
>
>
>
> - There is a *from_utf8_unchecked()* function that is explicitly
> unsafe and clearly states what it is doing in its name:
> https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html
> <https://urldefense.com/v3/__https:/doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html__;!!EHscmS1ygiU1lA!Ws2sxfg1O-WP_cC3rmBUWTcyFgYdcryhje6m4DRgYZtXa1sVVDkwujV5h-_T5w$>
> - There is also a *from_utf8()* function that is safe, but returns a *Result<&str,
> Utf8Error>* to indicate where UTF-8 validation failed
>
>
>
> This is good design because the functions have names that clearly indicate
> the encoding precondition and whether or not checking is performed. The
> safer version of the function is also has a shorter name, so it shows up
> first in autocompletion and documentation. People are guided to use the
> safe/checked option first.
>
>
>
> I agree that the name is bad. I think we want "cast" in the name though -
> or something of that nature.
>
>
>
> I strongly oppose adding the unchecked cast function to the standard
> library without simultaneously adding the corresponding checked cast.
>
>
>
> This is supposed to be a super super low level function.
>
> I would like to see checking functions in the standard for ex bool
> is_valid_utf8(range);
>
>
>
> However, checking for validity is an expensive operation.
>
> Is there a scenario in which we want to check utfN validity but can
> afford a copy?
>
>
>
> Either way, I'd like to see these things layered on top of one another.
>
>
>
> Cut yeah... suggestions for a better name? cast_string_unchecked?
>
>
>
>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin via
> SG16
> *Sent:* 28 January 2021 17:52
> *To:* SG16 <sg16_at_[hidden]>; Aaron Ballman <aaron_at_[hidden]>
> *Cc:* Corentin <corentin.jabot_at_[hidden]>
> *Subject:* [SG16] Reinterpreting pointers of character types
>
>
>
> EXTERNAL MAIL
>
> Hello,
>
>
>
> In which I propose
>
> reinterpret_string_cast, a function to cast between char* and char8_t*
> without UB.
>
>
>
> This ô terrible paper is intended as a discussion starter so that we can
> refine the design with the help of our good friends in CWG :)
>
>
>
> Have a great day,
>
>
>
> Corentin
>
>
> I think the big problem here is trying to make it a template.
>
>
>
> Make it named. It’s literally not possible to use this correctly in
> generic code.
>
Question then is do we want to solve the issue for wchar_t?
Because having the name of the encoding in the function kinda precludes
that - the sizeof(wchar_t) being platform dependant
>
>
> Peter
>
>
>
> *From:* Corentin <corentin.jabot_at_[hidden]>
> *Sent:* 28 January 2021 18:21
> *To:* Peter Brett <pbrett_at_[hidden]>
> *Cc:* sg16_at_[hidden]; Aaron Ballman <aaron_at_[hidden]>
> *Subject:* Re: [SG16] Reinterpreting pointers of character types
>
>
>
> EXTERNAL MAIL
>
>
>
>
>
> On Thu, Jan 28, 2021 at 7:08 PM Peter Brett <pbrett_at_[hidden]> wrote:
>
> Hi Corentin,
>
>
>
> At the moment, this just looks like a weird reinterpret_cast in the
> example you provide. I think it would be good to use a name that makes it
> clear that:
>
>
>
> 1. The function has a precondition of *a specific encoding* of text
> behind the pointer/span
> 2. The function is *hazardous to call*
>
>
>
> In my view, Rust’s API is ideal:
>
>
>
> - There is a *from_utf8_unchecked()* function that is explicitly
> unsafe and clearly states what it is doing in its name:
> https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html
> <https://urldefense.com/v3/__https:/doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html__;!!EHscmS1ygiU1lA!Ws2sxfg1O-WP_cC3rmBUWTcyFgYdcryhje6m4DRgYZtXa1sVVDkwujV5h-_T5w$>
> - There is also a *from_utf8()* function that is safe, but returns a *Result<&str,
> Utf8Error>* to indicate where UTF-8 validation failed
>
>
>
> This is good design because the functions have names that clearly indicate
> the encoding precondition and whether or not checking is performed. The
> safer version of the function is also has a shorter name, so it shows up
> first in autocompletion and documentation. People are guided to use the
> safe/checked option first.
>
>
>
> I agree that the name is bad. I think we want "cast" in the name though -
> or something of that nature.
>
>
>
> I strongly oppose adding the unchecked cast function to the standard
> library without simultaneously adding the corresponding checked cast.
>
>
>
> This is supposed to be a super super low level function.
>
> I would like to see checking functions in the standard for ex bool
> is_valid_utf8(range);
>
>
>
> However, checking for validity is an expensive operation.
>
> Is there a scenario in which we want to check utfN validity but can
> afford a copy?
>
>
>
> Either way, I'd like to see these things layered on top of one another.
>
>
>
> Cut yeah... suggestions for a better name? cast_string_unchecked?
>
>
>
>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin via
> SG16
> *Sent:* 28 January 2021 17:52
> *To:* SG16 <sg16_at_[hidden]>; Aaron Ballman <aaron_at_[hidden]>
> *Cc:* Corentin <corentin.jabot_at_[hidden]>
> *Subject:* [SG16] Reinterpreting pointers of character types
>
>
>
> EXTERNAL MAIL
>
> Hello,
>
>
>
> In which I propose
>
> reinterpret_string_cast, a function to cast between char* and char8_t*
> without UB.
>
>
>
> This ô terrible paper is intended as a discussion starter so that we can
> refine the design with the help of our good friends in CWG :)
>
>
>
> Have a great day,
>
>
>
> Corentin
>
>
Received on 2021-01-28 12:37:20