DOCUMENT NUMBER? Adding string-slicing to C++

Draft Proposal,

This version:
TBA
Issue Tracking:
GitHub
Editor:
rhidiandewit@gmail.com

Abstract

Parsing in C++ has improved a lot over the years with the introduction of std::basic_string::contains(), std::basic_string::starts_with() and std::basic_string::ends_with().
One thing that misses in this list of additions is string slicing where the user can select a part of a string with a start index and end index, as opposed to a start index and a count.

1. Table of Contents

2. Changelog

2.1. R5

2.2. R4

2.3. R3

2.4. R2

2.5. R1

3. Motivation and Scope

Parsing and string manipulation in C++ used to be very cumbersome, with seemingly basic and trivial methods missing from std::basic_string. The introduction of C++20 and C++23 resolved some of these issues by adding the above listed utility functions. I believe we can make string manipulation in C++ even better by adding more of these utility functions to std::basic_string, and one option I always miss, that is present in other programming languages (such as Python), is string-slicing. Python’s string-slicing is very graceful and easy-to-use, but C++ does not support that syntax.
Instead, I propose to add several functions to std::basic_string to emulate string-slicing.
The functions I propose to add to std::basic_string are the following:
namespace std
{
  /* 1. */ constexpr basic_string_view  basic_string::operator[](size_t start, size_t end) const;
  /* 2. */ constexpr basic_string_view  basic_string::slice(size_t start, size_t end) const;
  /* 3. */ constexpr basic_string_view  basic_string::first(size_t count) const noexcept;
  /* 4. */ constexpr basic_string_view  basic_string::last(size_t count) const noexcept;
}

4. Impact on the Standard

Since these are only trivial functions requiring no major changes to the language or changes to existing API, the impact of this proposal on the standard is minimal.
These functions can already be implemented in the current version of C++23 without any extra changes.
Implementation will be left up to the vendor of course, but since these are trivial functions, we can provide a "template" implementation.

5. Design Decisions

There is a choice in whether a std::basic_string is returned, or a std::basic_string_view is returned by these new utility functions.
It is best for these functions to return std::basic_string_view since:
  1. These functions will most often be used to find something in a string, often not requiring a new dynamic allocation to be made.

  2. std::basic_string::contains(), std::basic_string::starts_with() and std::basic_string::ends_with() all take a std::basic_string_view as a parameter. Therefore, the return value of the proposed functions matching up with these is a benefit.

  3. If the user wants a std::basic_string instead of a std::basic_string_view, they can always construct a std::basic_string.

C++23 introduced operator[] with any number of subscripts cppreference here.
Using this technique, we can very closely mimic what Python does with their string slicing. This does raise the question if std::basic_string::slice() is still useful, but in my opinion it can serve as a the safe variant of the unsafe operator[](size_t start, size_t end) like .at(size_t pos) is the safe variant of the unsafe operator[](size_t pos).

6. Technical Specifications

  1. std::basic_string::operator[] takes 2 parameters: size_t start and size_t end and returns a std::basic_string_view.

    • start is the starting index (inclusive) of where to start the slice.

      • There is no guarantee of safety when start >= size().

    • end is the ending index (exclusive) of where to end the slice.

      • There is no guarantee of safety when end > size()

      • There is no guarantee of safety when end < start

  2. std::basic_string::slice() takes 2 parameters: size_t start and size_t end and returns a std::basic_string_view.

    • start is the starting index (inclusive) of where to start the slice.

      • std::out_of_range is thrown when start >= size().

    • end is the ending index (exclusive) of where to end the slice.

      • if end > size() then end will be set to size()

      • if end < start then end will be set to start

  3. std::basic_string::first() takes 1 parameter: size_t count and returns a std::basic_string_view.

    • count is the amount of characters to be included (counting from index 0) in the slice.

      • if count >= size() then count will be set to size().

  4. std::basic_string::last() takes 1 parameter: size_t count and returns a std::basic_string_view.

    • count is the amount of characters to be included (counting from the last index) in the slice.

      • if count >= size() then count will be set to size().

These are easily implemented functions and depend on specific vendor-implementation of std::basic_string, but I have provided unit tests and sample implementations here.

7. Proposed Wording

7.1. Addition to <string>

Add the following to 23.4.3.1 basic.string.general:
// [...]
namespace std {
  // [...]
	
  // [string.ops], string operations
  // [...]
  constexpr bool contains(const charT* x) const;
	
  constexpr basic_string_view<charT, traits> operator[](size_t start, size_t end) const noexcept;
  constexpr basic_string_view<charT, traits> slice(size_t start, size_t end) const;
  constexpr basic_string_view<charT, traits> first(size_t count) const noexcept;
  constexpr basic_string_view<charT, traits> last(size_t count) const noexcept;
}

7.2. std::basic_string::operator[]

Add the following subclause to 23.4.3.8 string.ops:

7.3. std::basic_string::slice

Add the following subclause to 23.4.3.8 string.ops:

7.4. std::basic_string::first

Add the following subclause to 23.4.3.8 string.ops:

7.5. std::basic_string::last

Add the following subclause to 23.4.3.8 string.ops:

8. Acknowledgements

The author thanks both Zhihao Yuan and Nathaniel Rupprecht and many others for their suggestions to this proposal.