<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 26 Mar 2020 at 16:46, Tom Honermann &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <div>On 3/25/20 3:41 PM, Steven R. Loomis
      via SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <br>
      <div>
        <blockquote type="cite">
          <div>El mar. 24, 2020, a las 8:42 a. m., Corentin
            &lt;<a href="mailto:corentin.jabot@gmail.com" target="_blank">corentin.jabot@gmail.com</a>&gt;
            escribió:</div>
          <div>
            <div dir="ltr">
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Tue, 24 Mar 2020 at
                  15:42, Steven R. Loomis &lt;<a href="mailto:srl295@gmail.com" target="_blank">srl295@gmail.com</a>&gt;
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div>Corentin,
                    <div> Please see some of the work done in
                      ICU on encodings.  </div>
                    <div><br>
                    </div>
                    <div>In particular, IANA does not specify
                      the actual mapping. So we have found the IANA
                      names insufficient to distinguish two actual
                      encodings, shift_jis is an example.  Comment and
                      datafile:</div>
                    <div><a href="https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93" target="_blank">https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93</a></div>
                    <div><br>
                    </div>
                    <div>So while IANA names are widely used
                      from a spec point of view, in practice there are
                      many, many challenges with their use in
                      implementation.</div>
                  </div>
                </blockquote>
                <div><br>
                </div>
                <div>This proposal is solely about names and
                  not encoding conversion facilities</div>
              </div>
            </div>
          </div>
        </blockquote>
        <br>
      </div>
      <div>I understand, but that is exactly how we get into
        compatibility problems today. I mentioned Shift_Jis &lt;<a href="https://en.wikipedia.org/wiki/Shift_JIS#Multiple_versions" target="_blank">https://en.wikipedia.org/wiki/Shift_JIS#Multiple_versions</a>&gt;
        - standard name, incompatible implementations. There are many
        other issues which are visible from the mapping table, where an
        IANA name alone is not sufficient.</div>
    </blockquote>
    It seems that Big5 suffers a similar issue.  If my research is
    correct, IANA recognizes Big5 and Big5-HKSCS, but the Big5 variant
    in the Encoding Standard is a merged version of them that is not a
    super set of either.<br>
    <blockquote type="cite">
      <div><br>
      </div>
      <div>Giving only names without specifying encoding conversion is
        less than helpful, indeed harmful.</div>
      <div>We know there are incompatibilities. Why give a false sense
        of security for something that’s clearly underspecified?</div>
    </blockquote>
    <p>The initial motivation for this feature was to allow a C++
      implementation to communicate to a program the encoding used to
      encode character and string literals, the encoding used by the
      system, and the locale dependent encoding  used by the C and C++
      standard libraries.  This goal can&#39;t be accomplished without some
      encoding name or identifier.  One of the goals was to enable this
      identifier to be used in order to select a (compatible) encoding
      when interoperating with iconv, ICU, Windows APIs, etc...</p></div></blockquote><div><br></div><div>+1 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
    <p>It sounds like your perspective is that such goals should be
      accomplished in some other way.  For example, by having the
      implementation provide a codec rather than an identifier; ideally
      a codec that could be used in interaction with iconv, ICU, etc...
      (though this would clearly require enhancements to those code
      bases).</p></div></blockquote><div>Not being tied to an encoder is key to that proposal, both because we are trying to solve the blackbox problem that the C functions have, and because this is intended to be low cost and free standing </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
    <p>Do you have other suggestions for how to think about this?<br>
    </p>
    <blockquote type="cite">
      <div><br>
      </div>
      <div>At this point in history, I would recommend using the WHATWG
        names and behaviors exactly. Anything further requires a
        specific repository of mappings and behaviors.  Perhaps there
        could be a namespaced use, such as “icu:ibm-1251_P100-1995” or
        “15897:ISO-8859-1&quot; which precisely specifies one table.</div></blockquote></div></blockquote><div>An implementation could return icu:ibm-1251_P100-1995 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><blockquote type="cite">
    </blockquote>
    WHATWG specifies a more limited set of encodings than IANA does. 
    I&#39;m not sure how to square this comment with your later one stating
    that the IANA mappings are insufficient.  If IANA is insufficient,
    what is it about the WHATWG standard that would make it sufficient?<br></div></blockquote><div> </div><div>In particular doesn&#39;t start to cover the set of encodings supported buy compilers and systems</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><blockquote type="cite"><div>
      </div>
      <blockquote type="cite">
        <div>El mar. 24, 2020, a las 8:02 a. m., keld--- via
          SG16 &lt;<a href="mailto:sg16@lists.isocpp.org" target="_blank">sg16@lists.isocpp.org</a>&gt;
          escribió:</div>
        <br>
        <div><span style="float:none;display:inline">iso 15897  provide3s actual mappings to iso
            10646 in posix compatible charmap farmat.</span><br>
          <span style="float:none;display:inline">names are compatible with iana, built from some
            of the same sources.</span><br>
          <span style="float:none;display:inline">unicode inc. wanted to reinvent the wheel.</span><br>
        </div>
      </blockquote>
      <div><br>
      </div>
      <div>Hi, Keld. Actually, this work is based on IBM mapping tables
        and the customer need to explicitly specify character encoding
        mappings.  We have critical customer data that would be damaged
        if we only used IANA mappings. The mappings needed aren’t in the
        15897 registry.</div>
    </blockquote>
    <p>Thanks, I think this is useful information that the IANA registry
      is insufficient in practice for known use cases.</p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite">
      <div>However, The CDRA/CCSID and ICU converter tables are widely
        implemented. For one thing,  POSIX charmaps supported the
        substitution controls and multi way fallback behavior that was
        needed for some tables. That’s my recollection.  Also, many
        converters are better specified as algorithms than tables.</div>
      <div><br>
      </div>
      <div>
        <div style="color:rgb(0,0,0)">--<br>
          Steven R. Loomis | @srl295 | <a href="http://git.io/srl295" target="_blank">git.io/srl295</a></div>
        <div style="color:rgb(0,0,0)"><br>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
    </blockquote>
    <p><br>
    </p>
  </div>

</blockquote></div></div>

