Date: Wed, 25 Jun 2025 16:18:45 +0000
On 24/06/2025 18:01, Martin Uecker wrote:
> Am Dienstag, dem 24.06.2025 um 16:31 +0000 schrieb Niall Douglas via Liaison:
>> On 24/06/2025 13:56, Niall Douglas wrote:
>>
>>> This is what I am asking now - would WG14 like me to write a paper exploring the overheads of UTF-8 compatible variable length prefixing of variable length octet arrays? I think that for short strings, the overhead would be exactly nil, but it will rise as a percentage of total as strings get longer before shrinking again.
>>
>> Seeing as I am unemployed, I went ahead and wrote up that paper. A first draft is attached.
>>
>> See what you all make of it.
>
>
> I would fear that anything more complicated than a size_t is inviting
> bugs and might prevent this from becoming a type that can easily
> be used also by other languages.
All major programming languages can speak C.
If you want the length of one of these arrays, just call the C library
function which will tell you.
The encoding that the paper proposes is no more complicated than the
UTF-8 encoding. If UTF-8 can be implemented bug free across multiple
languages, so can this.
Niall
> Am Dienstag, dem 24.06.2025 um 16:31 +0000 schrieb Niall Douglas via Liaison:
>> On 24/06/2025 13:56, Niall Douglas wrote:
>>
>>> This is what I am asking now - would WG14 like me to write a paper exploring the overheads of UTF-8 compatible variable length prefixing of variable length octet arrays? I think that for short strings, the overhead would be exactly nil, but it will rise as a percentage of total as strings get longer before shrinking again.
>>
>> Seeing as I am unemployed, I went ahead and wrote up that paper. A first draft is attached.
>>
>> See what you all make of it.
>
>
> I would fear that anything more complicated than a size_t is inviting
> bugs and might prevent this from becoming a type that can easily
> be used also by other languages.
All major programming languages can speak C.
If you want the length of one of these arrays, just call the C library
function which will tell you.
The encoding that the paper proposes is no more complicated than the
UTF-8 encoding. If UTF-8 can be implemented bug free across multiple
languages, so can this.
Niall
Received on 2025-06-25 16:18:49