sg16: Re: [SG16-Unicode] Comments on D1629R1 Standard Text Encoding

From: Thiago Macieira <thiago_at_[hidden]>
Date: Sun, 18 Aug 2019 23:33:18 -0700

On Sunday, 18 August 2019 12:47:27 PDT Henri Sivonen wrote:
> On Sun, Aug 18, 2019, 19:07 Thiago Macieira <thiago_at_[hidden]> wrote:
> > On Saturday, 17 August 2019 12:25:57 PDT Henri Sivonen wrote:
> > > To the extent other programming languages that have encoding
> > > conversion in their standard library, such as Java, focus on
> > > contiguous buffers rather than iteration, it's worthwhile to study if
> > > application developers really feel that something important is
> > > missing.
> >
> > We were just discussing URLs in the cpplang Slack and that reminded me:
> > there's exactly one in 10 years case that I've needed to decode a non-
> > contiguous byte range and that's when parsing a URL.
>
> Can you elaborate on this? Per spec, URL parsing doesn't invoke a decoder
> but an encoder:
> https://url.spec.whatwg.org/#query-state

You need to decode before you can encode again. Just try normalising the
following URL/IRI excerpt:
%C3%C3%A9%A9

WHATWG is not normative. Please use RFC 3986 and 3987.

> Here's the corresponding code in Firefox using the span-oriented API that I
> linked to in my previous email:
> https://searchfox.org/mozilla-central/source/netwerk/base/nsStandardURL.cpp#
> 138 The span-oriented API works well here in my opinion, even though this
> case is a more advanced used of the API that implement a custom replacement
> behavior.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-08-19 08:33:21