DOCUMENT NUMBER? Adding string-slicing to C++

Draft Proposal,

This version:
TBA
Issue Tracking:
GitHub
Editor:
rhidiandewit@gmail.com

Abstract

Parsing in C++ has improved a lot over the years with the introduction of std::basic_string::contains(), std::basic_string::starts_with() and std::basic_string::ends_with().
One thing that misses in this list of additions is string slicing where the user can select a part of a string with a start index and end index, as opposed to a start index and a count.

1. Table of Contents

2. Changelog

2.1. R6

2.2. R5

2.3. R4

2.4. R3

2.5. R2

2.6. R1

3. Motivation and Scope

Parsing and string manipulation in C++ used to be very cumbersome, with seemingly basic and trivial methods missing from std::basic_string. The introduction of C++20 and C++23 resolved some of these issues by adding the above listed utility functions.
I believe we can make string manipulation in C++ even better by adding more of these utility functions to std::basic_string, and one option I always miss, that is present in other programming languages (such as Python), is string-slicing. Python’s string-slicing is very graceful and easy-to-use, but C++ does not support that syntax.
Instead, I propose to add a subview() function to std::basic_string and std::string_view to emulate string-slicing.

I think .substr() can sometimes be a bit cumbersome to use, especially when the substring you’re trying to get is easily gotten via a start and end index, instead of a start and count.
.subview() will be in function identical to .substr(). Its main purpose is for variety in the API for programmers who prefer using start, end over start, count.
Why would we even want this? Take the following example:
const std::string google{ "www.google.com" };

// Subview
std::cout << google.subview(google.find('.') + 1, google.find_last_of('.')) << "\n";

// Substr
size_t start = google.find('.') + 1;
std::cout << google.substr(start, google.find_last_of('.') - start) << "\n";

Both do functionally the same thing, but .substr() requires a calculation to get the length and a need to store start since we need it to calculate the length of the substring (although this is not technically necessary, not doing so leads to code duplication), while .subview() can simply pass the same start and just .find_last_of() instead of having to calculate a length.

At the end of the day, this is more about adding to the richness of the API and making the API nicer for some programmers than it is about replacing .substr() or creating entirely new functionality.

4. Impact on the Standard

Since these are only trivial functions requiring no major changes to the language or changes to existing API, the impact of this proposal on the standard is minimal.
These functions can already be implemented in the current version of C++23 without any extra changes.

Implementation will be left up to the vendor of course, but since these are trivial functions, we can provide a "template" implementation.

5. Technical Specifications

  1. std::basic_string::subview() takes 2 parameters: size_t start and size_t end and returns a std::basic_string.

    • start is the starting index (inclusive) of where to start the slice.

      • std::out_of_range is thrown when start >= size().

    • end is the ending index (exclusive) of where to end the slice.

      • if end > size() then end will be set to size()

      • if end < start then end will be set to start

  2. std::basic_string_view::subview() takes 2 parameters: size_t start and size_t end and returns a std::basic_string_view.

    • start is the starting index (inclusive) of where to start the slice.

      • std::out_of_range is thrown when start >= size().

    • end is the ending index (exclusive) of where to end the slice.

      • if end > size() then end will be set to size()

      • if end < start then end will be set to start

These are easily implemented functions and depend on specific vendor-implementation of std::basic_string and std::basic_string_view, but I have provided unit tests and sample implementations here.

6. Proposed Wording

6.1. Addition to <string>

Add the following to 23.4.3.1 basic.string.general:
// [...]
namespace std {
  // [...]
	
  // [string.ops], string operations
  // [...]
  constexpr bool contains(const charT* x) const;
	
  constexpr basic_string subview(size_t start, size_t end) const;
}

6.2. std::basic_string::subview

Add the following subclause to 23.4.3.8 string.ops:

6.3. Addition to <string_view>

Add the following to 23.3.3.1 string.view.template.general:
namespace std {
  // [...]
	
  // [string.ops], string operations
  // [...]
  constexpr basic_string_view substr(size_type pos = 0,
                    size_type n = npos) const;       // freestanding-deleted  
									   
  constexpr basic_string_view subview(size_t start, size_t end) const;
}

6.4. std::basic_string_view::subview

Add the following subclause to 23.3.3.8 string.view.ops:

7. Acknowledgements

The author thanks both Zhihao Yuan and Nathaniel Rupprecht and many others for their suggestions to this proposal.