C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Dedicated website with AI that has processed all papers

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Thu, 17 Jul 2025 16:34:39 +0100
I want to make sure I have everyone's names right before I release
Version 1 of the Paper Kernel C++ in a week or two's time.

I spent many hours working on all your names, in particular
consolidating multiple forms into one, e.g. "B. Stroustrup" and
"Bjarne Stroustrup" are the same person. In my own case, "TPK Healy"
and "Thomas P. K. Healy" are the same person. For German names, I have
preferred the umlaut instead of putting an 'e' after the vowel, except
for where I've seen very consistent contrary use such as Andrew
Koenig. Shout out to the Spanish folks who gave me a bit of work to
do. The Dutch too. Even with the help of AI, I still spent many hours
mapping alternative forms of names to primary forms of names.

Here are all the names in a header file in the main program:

    https://github.com/healytpk/paperkernelcxx/blob/main/main_program/AUTO_GENERATED_names.hpp

There are two arrays, the first is the primary names, so for example
"Bjarne Stroustrup" is a primary name. The second array is the
alternative names, and this is where "B. Stroustrup" gets mapped to a
primary. "TPK Healy" gets mapped to "Thomas P. K. Healy". "Kyle
Kloepper" gets mapped to "Kyle Klopper (with an umlaut over the 'o')".

Please take a quick look in the header file to make sure I have your
name correct before I release Version 1 in a week or two. If you see a
misspelling of your name in the array of alternatives, it's because
your name is misspelled on the 'open-std.org' website. Every single
name you see in that header file was copied verbatim from the
"open-std.org" website by a crawler program -- so no human error here
on my part. I had to manually correct a few of these misspellings
(even after asking AI to pick out the more obvious one's).

If you've submitted a paper and you don't see your name, please tell
me. Or if your name is merged with a different person, for example if
your name is "Thomas Smith", but sometimes "T. Smith" gets mapped to
"Timothy Smith", please tell me. I had to do a little investigating in
places, for example 'D. Walker' could be Daniel Walker or Daryle
Walker. Sometimes all you have to work with is a surname, e.g.
"Garland" could be either Jeff Garland or Michael Garland.

If you want your name in Kanji or Arabic in parentheses beside your
Roman-script name, please ask me. All builds of Paper Kernel C++ will
support full Unicode for all architectures and operating systems.

If you want your name changed, e.g. you got married, you became a
Sikh, you did the Hajj, you transitioned, you graduated from witness
protection, personal reasons, no reason, then please ask me. I don't
seek an explanation for names, just tell me the primary and I'll
change it in the header file. Or if you want diacritics added, e.g. an
acute accent over the 'e', then please ask me.

Please let me know if someone in the list has died, for example I've
tagged "(1982 -2024)" onto Ed's name.

If you scroll through the main program's code at the moment you'll see
lots of bad stuff . . . I needed to get stuff working quickly so some
functions still have rubbish like "return *new std::string(. . .);"
which of course is a memory leak. I'll be tying up all those loose
ends in the coming week or two. It has all come together very nicely
though. About 9000 papers are in RAM all at once, so it takes up a gig
of RAM, but it's lightning fast for loading and searching. For those
who want to conserve RAM, I might give the option of keeping them on
disk.

Received on 2025-07-17 15:34:53