Date: Sat, 2 Sep 2023 04:56:03 +0000
Abstract
This paper proposes to add iterator-based versions of the find family of functions for basic_string and basic_string_view. This is to align with the iterator-based interface of the C++ standard library. The current use of indices and npos is inconsistent with the C++ style and causes confusion and inefficiency.
Motivation
The C++ standard library extensively uses iterators as a generic way to access and manipulate elements in a range. Iterators were invented by Alexander Stepanov, who also designed the STL, which was later incorporated into the standard library. However, std::string predates the invention of iterators, and thus its find family of member functions does not use iterators, but rather indices and a special value npos to indicate the position of elements or substrings. This is unfortunate, as it creates a discrepancy between the interface of std::string and other standard containers and algorithms.
The use of indices and npos has several drawbacks:
* It is inconsistent with the iterator-based interface of the standard library, which makes std::string less compatible with generic algorithms and utilities.
* It confuses users, as npos is a static data member, which might be overlooked by someone who is not familiar with C++ or a beginner. This increases the learning cost, and conceptually, npos is a magic value.
* It is inefficient, as it requires extra arithmetic operations to convert between indices and iterators, or to check for npos. For example, one might need to write something like this:
if (auto pos = s.find(c); pos != std::string::npos) {
auto it = s.begin() + pos;
// do something with it
}
This involves an addition operation, which could be avoided if s.find(c) returned an iterator directly.
Proposal
I propose to add a new set of find functions for basic_string and basic_string_view, which take and return iterators instead of indices. These functions will have the same name as the existing ones, but with an “x” prefix. For example:
template<class CharT, class Traits, class Allocator>
constexpr basic_string<CharT, Traits, Allocator>::const_iterator
basic_string<CharT, Traits, Allocator>::xfind(CharT ch, basic_string<CharT, Traits, Allocator>::const_iterator first = {}) const noexcept;
template<class CharT, class Traits>
constexpr basic_string_view<CharT, Traits>::const_iterator
basic_string_view<CharT, Traits>::xfind(CharT ch, basic_string_view<CharT, Traits>::const_iterator first = {}) const noexcept;
These functions will behave similarly to the existing ones, except that will return an iterator to the first element of the found substring, or the end iterator if not found and they will also take iterators as parameters to specify the range to search in, instead of indices. For example:
std::string s {"Hello world"};
auto it = s.xfind("world"); // returns an iterator to 'w'
auto it2 = s.xfind("foo"); // returns s.end()
auto it3 = s.xfind("ll", s.begin() + 2); // returns an iterator to 'l'
auto it4 = s.xfind("ll", s.begin() + 3); // returns s.end()
This will make the interface of basic_string and basic_string_view more consistent with the rest of the standard library, and avoid the confusion and inefficiency caused by indices and npos. I also propose to encourage users to migrate to the new iterator-based ones.
Design Decisions
In the initial design, I used the "i" prefix to name the new functions. However, Paul Fee pointed out that Boost already uses the same prefix for some string algorithms that are case insensitive, such as "icontains" and "ifind_first". This could cause confusion and name clashes for users who use both Boost and the standard library. Therefore, I changed the prefix to "x", which conveys the idea of extending the functionality of the existing find functions, rather than implying a different behavior.
In addition, I originally proposed to deprecate the index-based find functions, and encourage users to migrate to the new iterator-based ones. However, Jonathan Wakely and Sebastian Wittmeier argued that this would leave too little time for users to adapt to the change, and that deprecating widely used functions should be done with more care and discussion. Therefore, I removed the idea from this proposal.
Impact on existing code
This proposal is a pure library extension. However, it does introduce new names into the std namespace, which might cause name collisions with user-defined names.
References
* [1] Alexander Stepanov, “STL and Its Design Principles”, Talk presented at Adobe Systems Inc., 2002.
* [2] Bjarne Stroustrup, “The Design and Evolution of C++”, Addison-Wesley, 1994.
This paper proposes to add iterator-based versions of the find family of functions for basic_string and basic_string_view. This is to align with the iterator-based interface of the C++ standard library. The current use of indices and npos is inconsistent with the C++ style and causes confusion and inefficiency.
Motivation
The C++ standard library extensively uses iterators as a generic way to access and manipulate elements in a range. Iterators were invented by Alexander Stepanov, who also designed the STL, which was later incorporated into the standard library. However, std::string predates the invention of iterators, and thus its find family of member functions does not use iterators, but rather indices and a special value npos to indicate the position of elements or substrings. This is unfortunate, as it creates a discrepancy between the interface of std::string and other standard containers and algorithms.
The use of indices and npos has several drawbacks:
* It is inconsistent with the iterator-based interface of the standard library, which makes std::string less compatible with generic algorithms and utilities.
* It confuses users, as npos is a static data member, which might be overlooked by someone who is not familiar with C++ or a beginner. This increases the learning cost, and conceptually, npos is a magic value.
* It is inefficient, as it requires extra arithmetic operations to convert between indices and iterators, or to check for npos. For example, one might need to write something like this:
if (auto pos = s.find(c); pos != std::string::npos) {
auto it = s.begin() + pos;
// do something with it
}
This involves an addition operation, which could be avoided if s.find(c) returned an iterator directly.
Proposal
I propose to add a new set of find functions for basic_string and basic_string_view, which take and return iterators instead of indices. These functions will have the same name as the existing ones, but with an “x” prefix. For example:
template<class CharT, class Traits, class Allocator>
constexpr basic_string<CharT, Traits, Allocator>::const_iterator
basic_string<CharT, Traits, Allocator>::xfind(CharT ch, basic_string<CharT, Traits, Allocator>::const_iterator first = {}) const noexcept;
template<class CharT, class Traits>
constexpr basic_string_view<CharT, Traits>::const_iterator
basic_string_view<CharT, Traits>::xfind(CharT ch, basic_string_view<CharT, Traits>::const_iterator first = {}) const noexcept;
These functions will behave similarly to the existing ones, except that will return an iterator to the first element of the found substring, or the end iterator if not found and they will also take iterators as parameters to specify the range to search in, instead of indices. For example:
std::string s {"Hello world"};
auto it = s.xfind("world"); // returns an iterator to 'w'
auto it2 = s.xfind("foo"); // returns s.end()
auto it3 = s.xfind("ll", s.begin() + 2); // returns an iterator to 'l'
auto it4 = s.xfind("ll", s.begin() + 3); // returns s.end()
This will make the interface of basic_string and basic_string_view more consistent with the rest of the standard library, and avoid the confusion and inefficiency caused by indices and npos. I also propose to encourage users to migrate to the new iterator-based ones.
Design Decisions
In the initial design, I used the "i" prefix to name the new functions. However, Paul Fee pointed out that Boost already uses the same prefix for some string algorithms that are case insensitive, such as "icontains" and "ifind_first". This could cause confusion and name clashes for users who use both Boost and the standard library. Therefore, I changed the prefix to "x", which conveys the idea of extending the functionality of the existing find functions, rather than implying a different behavior.
In addition, I originally proposed to deprecate the index-based find functions, and encourage users to migrate to the new iterator-based ones. However, Jonathan Wakely and Sebastian Wittmeier argued that this would leave too little time for users to adapt to the change, and that deprecating widely used functions should be done with more care and discussion. Therefore, I removed the idea from this proposal.
Impact on existing code
This proposal is a pure library extension. However, it does introduce new names into the std namespace, which might cause name collisions with user-defined names.
References
* [1] Alexander Stepanov, “STL and Its Design Principles”, Talk presented at Adobe Systems Inc., 2002.
* [2] Bjarne Stroustrup, “The Design and Evolution of C++”, Addison-Wesley, 1994.
Received on 2023-09-02 04:56:09