C++ Logo

SG16

Advanced search

Subject: Re: Reinterpreting pointers of character types
From: Peter Brett (pbrett_at_[hidden])
Date: 2021-01-28 12:08:31


Hi Corentin,

At the moment, this just looks like a weird reinterpret_cast in the example you provide. I think it would be good to use a name that makes it clear that:


  1. The function has a precondition of a specific encoding of text behind the pointer/span
  2. The function is hazardous to call

In my view, Rust’s API is ideal:


  * There is a from_utf8_unchecked() function that is explicitly unsafe and clearly states what it is doing in its name: https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html
  * There is also a from_utf8() function that is safe, but returns a Result<&str, Utf8Error> to indicate where UTF-8 validation failed

This is good design because the functions have names that clearly indicate the encoding precondition and whether or not checking is performed. The safer version of the function is also has a shorter name, so it shows up first in autocompletion and documentation. People are guided to use the safe/checked option first.

I strongly oppose adding the unchecked cast function to the standard library without simultaneously adding the corresponding checked cast.

                           Peter

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Corentin via SG16
Sent: 28 January 2021 17:52
To: SG16 <sg16_at_[hidden]>; Aaron Ballman <aaron_at_[hidden]>
Cc: Corentin <corentin.jabot_at_[hidden]>
Subject: [SG16] Reinterpreting pointers of character types

EXTERNAL MAIL
Hello,

In which I propose
reinterpret_string_cast, a function to cast between char* and char8_t* without UB.

This ô terrible paper is intended as a discussion starter so that we can refine the design with the help of our good friends in CWG :)

Have a great day,

Corentin



SG16 list run by sg16-owner@lists.isocpp.org