C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Draft of paper to update the unicode standard reference

From: Zach Laine <whatwasthataddress_at_[hidden]>
Date: Thu, 17 May 2018 01:17:29 -0500
Yeah, I meant to mention in the call today that the linked doc (14651)
refers to UCS-{2,4} instead of UTF. It seems some work is needed on the
Unicode end.

Zach

On Wed, May 16, 2018 at 9:30 PM, Steve Downey <sdowney_at_[hidden]> wrote:

> The ISO documents all seem, at this point, to be somewhat lagging derived
> documents from the master doc produced by the consortium as the Unicode
> Standard.
> There's probably some bureaucratic requirement driving that, where
> unicode.org isn't accredited.
> The C++ standard is citing IETF RFCs,vso we ought to be able to cite
> unicode.org. If we also need to cite iso, then we should pull those cites
> from the Unicode Standard.
> We should never be in a position of following the standard or doing the
> right thing if we can help it.
>
>
>
> On Wed, May 16, 2018, 12:47 Zach Laine <whatwasthataddress_at_[hidden]>
> wrote:
>
>> Ah, thanks! The other link looked paywalled (and was anyways in French).
>>
>> Zach
>>
>>
>> On Wed, May 16, 2018 at 11:13 AM, Martinho Fernandes <rmf_at_[hidden]> wrote:
>>
>>> Why settle for a draft when it's publicly available for free? It's here:
>>> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html.
>>>
>>> Note that it does not define normalization, and instead it refers to
>>> UAX#15.
>>>
>>> On 16.05.18 17:55, Zach Laine wrote:
>>>
>>> Alisdair, 14651 sounds like it contains normalization and collation.
>>> Is that right? Also, any idea where a draft might be found? I'd hate to
>>> pay just to have a peek at something I don't actually have a use for.
>>>
>>> Zach
>>>
>>> On Wed, May 16, 2018 at 8:29 AM, Alisdair Meredith <alisdairm_at_[hidden]>
>>> wrote:
>>>
>>>> As far as I can tell, ISO 14651 covers the algorithms, or at least a
>>>> start on them:
>>>> https://www.iso.org/standard/68309.html
>>>>
>>>>
>>>> For C++20 I believe ISO 10646 is all that is needed, as the standard
>>>> uses unicode directly for
>>>> only definitions of character sets. Once SG-16 creates a proposal for
>>>> deeper support, it will
>>>> probably want these additional references though.
>>>>
>>>> AlisdairM
>>>>
>>>>
>>>> On May 16, 2018, at 08:32, Martinho Fernandes <rmf_at_[hidden]> wrote:
>>>>
>>>> ISO 30112 doesn't seem to be enough in the long run either. Correct me
>>>> if I'm wrong (I don't have access to the document), but from the abstract
>>>> it sounds like this just specifies description formats; no algorithms and
>>>> no data, just ways to specify them.
>>>>
>>>> It doesn't cover the ground in https://unicode.org/reports/tr15/,
>>>> https://unicode.org/reports/tr29/, https://unicode.org/reports/tr14/,
>>>> https://unicode.org/reports/tr9/, https://unicode.org/reports/tr50/,
>>>> ... (roughly in order of importance). I don't know if there are ISO
>>>> standards specifying the same aspects and staying in sync. I don't think
>>>> there are any; the Unicode FAQ doesn't mention any ISO standard other than
>>>> ISO 10646 (https://www.unicode.org/faq/unicode_iso.html). If there
>>>> are, let's use them; if there aren't, I think it'd be preferable to just
>>>> have one single reference to the Unicode specification than to have several
>>>> references to standards that may or may not get updated in lockstep and may
>>>> or may reflect the current state of the Unicode Standard.
>>>>
>>>>
>>>> FWIW I only mentioned annexes because they're easier to link to than
>>>> the core specification, even though there are some algorithms formally
>>>> defined within it that are also not covered in ISO 10646 nor ISO 30112.
>>>> Also note that a reference to a specific Unicode version encompasses "an
>>>> edition of the core specification, *The Unicode Standard*, together
>>>> with the Code Charts, Unicode Standard Annexes and the Unicode Character
>>>> Database" (from https://www.unicode.org/standard/standard.html)
>>>> On 16.05.18 14:10, keld_at_[hidden] wrote:
>>>>
>>>> If you want more than just character sets, you should refer ISO 30112,
>>>> which Unicode has tried to copy.
>>>>
>>>> 30112 is much more shaped to the POSIX/C/C++ model - not just UCS.
>>>>
>>>> Best regards
>>>> keld
>>>>
>>>>
>>>> On Fri, May 04, 2018 at 11:59:58PM +0200, R. Martinho Fernandes wrote:
>>>>
>>>> Can you explain why? For now the ISO reference is enough, but in the future we will need the Unicode Standard reference because ISO 10646 is only the character set.
>>>>
>>>> On May 4, 2018 11:57:08 PM GMT+02:00, keld_at_[hidden] wrote:
>>>>
>>>> I qould like that we use the reference to ISO 10646 instead of the
>>>> unicode inc. reference.
>>>> I have advocated that for quite a long time now.
>>>>
>>>> Best regards
>>>> keld
>>>>
>>>> On Fri, May 04, 2018 at 09:43:22PM +0000, Steve Downey wrote:
>>>>
>>>> I've been told that some people believe there's a policy that ISO
>>>>
>>>> Standards
>>>>
>>>> must cite other ISO Standards where those are available, which is why
>>>>
>>>> we're
>>>>
>>>> citing the ISO copies of Unicode and ECMAScript. I can't find an
>>>>
>>>> actual
>>>>
>>>> policy on this, though.
>>>> I'm willing to put in the Unicode.org preferred reference, with a
>>>>
>>>> fallback
>>>>
>>>> to the ISO reference. My only fear is that too many choices will lead
>>>>
>>>> to
>>>>
>>>> paralysis.
>>>>
>>>> On Fri, May 4, 2018 at 4:44 PM JF Bastien <cxx_at_[hidden]> <cxx_at_[hidden]> wrote:
>>>>
>>>>
>>>> The Unicode standard has guidance on how to cite it:
>>>> http://www.unicode.org/versions/index.html#Citations
>>>>
>>>> It would be useful to link to this guidance (and follow it).
>>>>
>>>> On Fri, May 4, 2018 at 1:10 PM, Steve Downey <sdowney_at_[hidden]> <sdowney_at_[hidden]>
>>>>
>>>> wrote:
>>>>
>>>> https://github.com/steve-downey/sg16/blob/d10250/papers/D1025R0.md
>>>>
>>>> There are some formatting issues I will clean up, in particular
>>>>
>>>> changing
>>>>
>>>> the links to not raw links, and moving the links down to a
>>>>
>>>> bibliography
>>>>
>>>> section.
>>>>
>>>> Also adding a title at the top.
>>>>
>>>> _______________________________________________
>>>> Unicode mailing listUnicode_at_[hidden]://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>> _______________________________________________
>>>> Unicode mailing listUnicode_at_[hidden]://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>> _______________________________________________
>>>> Unicode mailing listUnicode_at_[hidden]://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>> _______________________________________________
>>>> Unicode mailing listUnicode_at_[hidden]://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>>
>>>> --
>>>> Martinho
>>>>
>>>> _______________________________________________
>>>> Unicode mailing list
>>>> Unicode_at_[hidden]
>>>> http://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Unicode mailing list
>>>> Unicode_at_[hidden]
>>>> http://www.open-std.org/mailman/listinfo/unicode
>>>>
>>>>
>>>
>>> --
>>> Martinho
>>>
>>>
>> _______________________________________________
>> Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>>
>

Received on 2018-05-17 08:17:31