Date: Mon, 1 Jan 2024 19:30:53 -0500
The existing functions are woefully inadequate for use in performing
real world case matching, case conversion, or case identification
operations and the simple change you suggest won't help with that. Case
classification is quite complicated for many languages.
Consider that Unicode 15.1 defines 2544 characters (code points) that
have the Lowercase property
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Lowercase=Yes:]>,
but only 1951 that have the Uppercase property
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Uppercase=Yes:]>.
Case mapping is not a 1-1 operation. Further, case mapping is locale
dependent. The uppercase form of letter i (U+0069 LATIN SMALL LETTER I)
is I (U+0049 LATIN CAPITAL LETTER I) for some languages, but İ (U+0130
LATIN CAPITAL LETTER I WITH DOT ABOVE) for others. Likewise, ß (U+00DF
LATIN SMALL LETTER SHARP S) has historically had an uppercase form of SS
(two code points; U+0053 LATIN CAPITAL LETTER S) but ẞ (U+1E9E LATIN
CAPITAL LETTER SHARP S) is also recognized as an uppercase form.
Whether the existing functions are useful for any particular operation
depends on what you are trying to do. Though the existing functions have
locale (really character set) dependent behavior, they are effectively
only useful for asking whether an ASCII character is a lowercase or
uppercase ASCII character. Note that these functions don't work with
variable length character encodings like UTF-8.
Rather than investing effort into minimal improvements to the existing
functions, we should focus on suitable replacements that actually
suffice to enable real world case operations.
Tom.
On 12/23/23 10:29 AM, Wim Leflere via Std-Proposals wrote:
> Would there be interest in having more idiomatic versions of the
> character check functions (from cctype
> <https://en.cppreference.com/w/cpp/header/cctype> & cwctype
> <https://en.cppreference.com/w/cpp/header/cwctype>)?
>
> Example
> Replace
> "int std::islower(int ch)" and "int std::iswlower(std::wint_t ch)"
> With
> "bool std::is_lower(char ch)" and "bool std::is_lower(wchar_t ch)"
>
> Using overloads allows use in a more generic way.
> The implementation could call the C function (in the correct way).
>
> Wim
>
>
real world case matching, case conversion, or case identification
operations and the simple change you suggest won't help with that. Case
classification is quite complicated for many languages.
Consider that Unicode 15.1 defines 2544 characters (code points) that
have the Lowercase property
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Lowercase=Yes:]>,
but only 1951 that have the Uppercase property
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Uppercase=Yes:]>.
Case mapping is not a 1-1 operation. Further, case mapping is locale
dependent. The uppercase form of letter i (U+0069 LATIN SMALL LETTER I)
is I (U+0049 LATIN CAPITAL LETTER I) for some languages, but İ (U+0130
LATIN CAPITAL LETTER I WITH DOT ABOVE) for others. Likewise, ß (U+00DF
LATIN SMALL LETTER SHARP S) has historically had an uppercase form of SS
(two code points; U+0053 LATIN CAPITAL LETTER S) but ẞ (U+1E9E LATIN
CAPITAL LETTER SHARP S) is also recognized as an uppercase form.
Whether the existing functions are useful for any particular operation
depends on what you are trying to do. Though the existing functions have
locale (really character set) dependent behavior, they are effectively
only useful for asking whether an ASCII character is a lowercase or
uppercase ASCII character. Note that these functions don't work with
variable length character encodings like UTF-8.
Rather than investing effort into minimal improvements to the existing
functions, we should focus on suitable replacements that actually
suffice to enable real world case operations.
Tom.
On 12/23/23 10:29 AM, Wim Leflere via Std-Proposals wrote:
> Would there be interest in having more idiomatic versions of the
> character check functions (from cctype
> <https://en.cppreference.com/w/cpp/header/cctype> & cwctype
> <https://en.cppreference.com/w/cpp/header/cwctype>)?
>
> Example
> Replace
> "int std::islower(int ch)" and "int std::iswlower(std::wint_t ch)"
> With
> "bool std::is_lower(char ch)" and "bool std::is_lower(wchar_t ch)"
>
> Using overloads allows use in a more generic way.
> The implementation could call the C function (in the correct way).
>
> Wim
>
>
Received on 2024-01-02 00:30:56