C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Dedicated website with AI that has processed all papers

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Fri, 9 May 2025 13:04:30 +0100
On Fri, May 9, 2025 at 10:34 AM Sebastian Wittmeier wrote:
>
> They normally won't let you use their domain.
>
> They have an API access, which you can do from your server, either the answer being interpreted by your server code or directly presented to the user. It would run on your domain.
>
> It is quite inexpensive per request. Depending on the model, less than a cent or a few cents.
>
> You can also create some kind of plugins for the public ChatGPT instead, but that is much less powerful and I would not suggest it for that use.


At the end of this post I've written a C++ program to get all the
papers and revision numbers. There have been 6545 papers submitted if
you include all the revisions.

Here's quite a simple test to see how well ChatGPT knows all the
papers. I asked it:

    "Of all the C++ papers from P0001R0 up to P3672R0, how many of
these papers mention the word 'zip'?"

It came back with one paper that had 'zip' in its title, but at the end it said:

    "For a thorough analysis, one would need to review each paper
individually or utilize a search tool that indexes the content of
these documents."

So I retorted with:

    "Can you please read through each individual paper for me to confirm?"

but then it just keeps coming back with suggestions for how to
automate the search. The bottom line is that the free ChatGPT refuses
to read through the 6545 papers.

So I asked ChatGPT:
    "I want to set up a website that has ChatGPT that has been trained
on the 6545 papers that have been submitted. I want people around the
world to be able to use it, say 10 people every hour. What kind of
money would this cost me?", and it came back with:

    Component Cost Range (Monthly)
    -------------------------------------------
    Hosting & DB $30–$70
    LLM API (GPT-4o) $30–$50
    Miscellaneous $10–$20
    Total ~$70–$140/month

Next I asked: "What if I wanted to enable it to be used by hundreds if
not thousands of people every day? What would that cost me?", and it
came back with a max total cost of $600 per month. So it's not
absolutely crazy money. These would be small numbers to the people
that make charitable contributions.

Of course I can use the below program to download all the papers to my
webspace, and then use some really advanced text-searching software,
but an AI would be much more versatile. I mean I want to be able to
ask it questions like, "Of all the 6545 papers submitted so far, pick
out the one's that suggestion adding a new CV-qualifier to the
language". You wouldn't be able to make this query with an advanced
search tool as you'd get way too many false positives no matter what
way you try to word it.

I would also train the AI on all the posts to this mailing list (and
older mailing lists) from 1990 to 2025, but I would give the contents
of the papers a higher priority than the contents of the posts.

[ start C++ program ]

#include <cstddef> // size_t
#include <iostream> // cout, cerr
#include <regex> // regex, smatch
#include <set> // set
#include <string> // string, to_string
#include <curl/curl.h> // CURL, curl_easy_init
#include "Auto.h" // The 'Auto' macro

using std::cout, std::endl, std::string, std::set, std::size_t;

struct Paper {

    unsigned num, rev;

    bool operator<(Paper const other) const noexcept
    {
        return (num < other.num) || ( (num == other.num) && (rev < other.rev) );
    }

    char const *str(void) const noexcept
    {
        static thread_local char s[] = "PxxxxRxx";

        s[1] = '0' + num / 1000u % 10u;
        s[2] = '0' + num / 100u % 10u;
        s[3] = '0' + num / 10u % 10u;
        s[4] = '0' + num / 1u % 10u;

        if ( rev < 10u )
        {
            s[6] = '0' + rev;
            s[7] = '\0';
        }
        else
        {
            s[6] = '0' + rev / 10u % 10u;
            s[7] = '0' + rev / 1u % 10u;
            s[8] = '\0';
        }

        return s;
    }
};

std::ostream &operator<<(std::ostream &os, Paper const paper)
{
    return os << paper.str();
}

size_t WriteCallback(void *const contents, size_t const size, size_t
const nmemb, void *const userp) noexcept
{
    try
    {
        string *const data = static_cast<string*>(userp);
        data->append( static_cast<char*>(contents), size * nmemb );
        return size * nmemb;
    }
    catch(...)
    {
        return 0u;
    }
}

set<Paper> papers;

void ExtractPaperCodes(string const &content)
{
    std::regex pattern(R"(P(\d{4})R(\d+))");
    std::smatch match;
    string::const_iterator search_start( content.cbegin() );

    while ( std::regex_search(search_start, content.cend(), match, pattern) )
    {
        unsigned const num = static_cast<unsigned>(std::stoul(match[1].str())),
                       rev = static_cast<unsigned>(std::stoul(match[2].str()));

        papers.insert( Paper{ num, rev } );

        search_start = match.suffix().first;
    }
}

void FetchPaperCodesForYear(string const &year_url)
{
    CURL *const curl = curl_easy_init();
    if ( nullptr == curl ) throw std::runtime_error("Failed to
initialise CURL library");
    Auto( curl_easy_cleanup(curl) );

    CURLcode res;
    string read_buffer;

    curl_easy_setopt(curl, CURLOPT_URL, year_url.c_str());
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback );
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &read_buffer );
    res = curl_easy_perform(curl);

    if ( CURLE_OK != res ) throw std::runtime_error("CURL request
failed for " + year_url + ": " + curl_easy_strerror(res));

    ExtractPaperCodes(read_buffer);
}

auto main(void) -> int
{
    cout << "Fetching paper codes for year: " << std::flush;

    for ( unsigned year = 1989u; year <= 2025u; ++year )
    {
        string const year_url =
"https://www.open-std.org/jtc1/sc22/wg21/docs/papers/" +
std::to_string(year) + "/";
        cout << (1989u==year ? "" : ", ") << year << std::flush;
        FetchPaperCodesForYear(year_url);
    }

    for ( Paper const &paper : papers ) cout << endl << paper;

    cout << "\n\nTotal unique papers found: " << papers.size() << endl;
}

Received on 2025-05-09 12:04:45